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-1.1  Introduction: 


The  RPI  t2isk  has  been  concerned  with  the  development  of  expert  systems  techniques 
for  automated  photointerpretation.  More  specifically,  our  efforts  have  been  directed  toward 
the  development,  implementation  and  demonstration  of  techniques  which  will  mimic  the 
job  of  a  trained  photoanalyst  in  interpreting  objects  in  monochrome,  single-frame  aerial 
images.  This  is  a  difficult  task  which  requires  a  combination  of  numerical  and  symbolic 
image  processing  techniques. 

During  the  course  of  this  effort  we  have  developed  a  novel  hierarchical,  region-based 
approach  to  automated  photointerpretation  (cf.  [l]).  Basically,  this  approach  proceeds 
by  first  segmenting  the  input  image  into  disjoint  regions  which  differ  in  tonal  or  textural 
properties.  The  spatial  relationships  between  different  regions  are  then  expressed  in  terms 
of  the  associated  adjacency  graph  where  nodes  represent  regions  and  the  connectivity 
indicates  regions  which  are  spatially  contiguous.  Based  upon  knowledge  of  the  underlying 
spatial  adjacency  graph,  together  with  various  self  and  mutual  region  attributes  or  features, 
the  problem  is  then  that  of  assigning  interpretations,  or  object  categories,  to  each  of  the 
nodes.  This  is  generally  a  computationally  explosive  task.  The  novelty  of  our  approach 
is  that  we  have  been  able  to  develop  a  computationally  feasible  approach  to  this  symbolic 
interpretation  process. 

The  advantage  of  our  approach  is  b<ised  upon  two  important  properties:  First,  we 
model  the  interpretation  process  as  a  Markov  random  field  (MRF)  defined  or.  the  adjacency 
graph.  Secondly,  we  make  use  of  an  efficient  stochastic  relaxation  process  to  find  the 
most  likely  interpretation.  The  first  assumption  allows  us  to  localize  the  search  for  good 
interpretations  while  the  second  helps  in  avoiding  the  otherwise  computationally  explosive 
nature  of  the  search  for  optimum  interpretations. 

Our  major  effort  during  FY’87  has  been  in  refining  this  region  hierarchical  approach, 
improving  the  initial  segmentation  process  and,  finally,  demonstrating  the  approach  on 
real-world  aerial  photographs.  The  present  report  is  an  attempt  to  document  this  progress 
of  the  last  year. 

This  final  report  is  organized  as  follows:  In  the  remainder  of  this  Section  we  provide 
an  overview  of  the  current  status  of  our  hierarchical,  region-based  approach  to  automated 
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photointerpretation.  This  is  followed,  in  Section  4.2,  by  a  detailed  development  of  an  un¬ 
supervised  tonal  segmentation  scheme  under  a  Gaussian  modeling  assumption.  In  Section 
4.3  we  describe  a  corresponding  texture  segmentation  technique  based  upon  MRF’s.  Sub¬ 
sequently,  in  Section  4.4,  we  describe  a  novel  approach,  based  upon  information  theoretic 
concepts,  for  determination  of  the  number  of  distinct  image  classes  in  an  image.  This  latter 
issue  is  crucial  to  any  fully  automated  image  interpretation  scheme.  Finally,  in  Section 
4.5,  we  provide  a  summary  and  an  outline  of  research  directions  for  FY’88. 

4.1.1  Image  Interpretation  Approach: 

In  this  section  we  will  describe  the  current  status  of  our  automated  photointerpretation 
system,  review  the  pertinent  details  of  the  evolving  testbed  which  will  support  it  and 
illustrate  some  typical  results  obtained  so  far. 

A  block  diagram  of  the  overall  testbed  structure  is  illustrated  in  Fig.  4.1-1.  The  main 
function  of  the  preprocessor  is  to  provide  a  segmentation  of  the  image  into  disjoint  regions 
which  are  homogeneous  .vithin  a  region  but  differ  in  some  sense  from  adjacent  regions.  In 
the  next  several  Sections,  we  describe  various  segmentation  schemes  investigated  for  this 
purpose.  For  the  time  being  then  we  assume  that  a  segmentation  has  been  obtained. 

Once  a  segmentation  is  obtained,  however  preliminary,  the  regions  are  indexed  and  re¬ 
gion  maps  are  stored  in  the  image  database.  That  is,  the  actual  pixel  values  associated  with 
a  region  are  stored  separately  for  each  region.  In  addition,  various  attributes  associated 
with  each  region  are  stored.  This  includes  such  parameters  as  area,  perimeter,  boundary, 
elongation,  etc.  In  addition,  the  spatial  relationships  between  the  various  regions  are  main¬ 
tained.  This  is  most  easily  done  by  using  an  adjacency  graph  where  the  nodes  correspond 
to  regions  and  the  connectivity  indicates  spatial  relationships.  In  particular,  two  nodes 
are  connected  by  an  arc  or  edge  if  they  are  in  some  sense  spatial  neighbors.  The  values 
associated  with  arcs  can  include  mutual  information  corresponding  to  the  connected  nodes. 
This  information  might  include:  mutual  boundaries,  spatial  distances,  strength  of  mutual 
edges,  etc.  Image  interpretations  are  provided  by  the  inferencing  mechanism  which  has 
access  to  the  region  information  stored  in  the  image  database,  as  well  as  the  world  knov.’l- 
edge  stored  in  the  knowledge  database.  Feedback  to  the  image  preprocessor  is  through  the 
inferencing  mechanism. 
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It  should  be  noted  from  Fig.  4.1-1  that  the  testbed  allows  operator  intervention 
through  an  interactive  image  processing  and  display  terminal.  More  specifically,  the  op¬ 
erator  can  manually  extract  regions  using  a  joystick  or  trackball  and,  if  desired,  actually 
provide  interpretation  of  the  various  extracted  regions.  Once  the  disjoint  regions  are  out¬ 
lined  by  the  operator,  the  various  region  attributes  are  automatically  extracted  and  stored 
in  the  image  database  in  exactly  the  same  format  as  if  they  were  automatically  extracted 
by  the  image  preprocessor.  Furthermore,  in  cases  where  the  operator  provides  region  in¬ 
terpretations,  the  relevant  spatial  relationships  are  provided  to  the  knowledge  database 
allowing  updating  of  our  world  knowledge. 

Now  suppose  that  an  appropriate  initial  segmentation  is  obtained.  Let  the  distinct 
regions  be  labeled  ,  R2, . .  . ,  Rn  as,  for  example,  in  Fig.  4.1-2  where  N  ~  7.  The  corre¬ 
sponding  first-order  adjacency  graph  associated  with  this  segmented  image  then  appears 
as  indicated  in  Fig.  4.1-3.  By  first-order  adjacency  we  mean  here  that  regions  are  adjacent, 
or  are  neighbors,  if  and  only  if  they  are  spatially  contiguous.  The  problem  is  now:  given 
an  initial  segmentation,  to  provide  a  global  interpretation  for  each  of  the  nodes  given  mea¬ 
surement  attributes  associated  with  each  node,  context  information  associated  with  the 
mutual  relationships  specified  in  the  adjacency  graph  and  world  knowledge  as  prescribed 
in  the  knowledge  database.  A  detailed  description  of  our  approach  to  implementing  this 
interpretation  function  was  provided  previously  in  [l].  As  a  result,  the  following  discussion 
of  the  major  characteristics  of  this  approach  will  be  abbreviated  and  will  depend  upon  the 
more  extensive  development  in  [ij  for  details. 

■Suppose  then  that  the  segmented  regions  within  the  image  are  labeled  Ri,  R2, .  ■ . ,  Rn 
and  let  be  the  corresponding  global  interpretations  given  to  each  of  these 

regions  where  ,  K}.  Here,  we  have  K  specific  object  types  whose  labels 

are  to  be  assigned  to  each  of  the  regions  plus  the  ambiguous  or  irrelevant  object  type 
represented  by  the  label  or  symbol  (f>.  Suppose  we  define  the  region  information  as  R  = 
[R\ ,  R2, . .  ■ Rn)  and  the  interpretation  vector  I  =  {Ii,  I2,  ■  •  • ,  In)-  Note  there  are  at 
most  {K  +  1)^  possible  interpretation  vectors  although,  in  reality,  there  are  many  fewer 
than  this  since  a  valid  global  interpretation  should  not  allow  neighboring,  or  adjacent, 
regions  to  carry  identical  labels  except  for  the  uncertain  symbol,  <f>.  The  exact  number 
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of  interprelation  vectors  will  then  depend  specifically  upon  the  spatial  arrangements  of 
regions  and  is  thus  a  random  variable. 

Our  criterion  will  be  to  choose  the  estimated  global  interpretation  I  =  Iq  iff 

lo  arg  m^p{I|.?,  AT,  Z}  (1) 

Here,  Z  represents  information  describing  the  partitioning  into  regions,  K  represents  in¬ 
formation  in  the  knowledge  database  and  X  represents  the  corresponding  adjacency  graph 
which  includes  all  mecisurement  information,  both  for  each  region  separately  as  well  as 
mutual  measurement  information  between  regions.  The  quantity  p{l\Z,  K ,  X  }  represents 
the  conditional  probability  of  I  given  Z,  K  and  X .  This  quantity  may  be  difficult  to  specify 
theoretically,  but  the  work  in  :l]  prc  -  ided  a  nice  theoretical  framework  for  specifiying  the 
structure  of  this  conditional  probability.  The  optimization  in  (1)  is  then  over  all  legitimate 
interpretation  vectors;  the  resulting  estimate  is  called  the  maximum  a  posteriori  (MAP) 
estimate  and  is  well-founded  in  statistical  decision  and  estimation  theory  [2]. 

At  this  point  we  will  make  the  assumption  that,  conditioned  on  Z,K  and  X,  the 
interpretation  vector  I  is  a  Markov  random  field  (MRF)  defined  on  the  corresponding 
adjacency  graph.  The  concept  of  a  MRF  defined  on  a  2-D  lattice  has  provided  a  useful 
model  for  images.  However,  as  pointed  out  in  [3],  the  concept  of  a  MRF  need  not  be 
restricted  to  lattices  but  can  be  defined  on  more  general  structures  such  as  graphs.  Thus, 
it  appears  quite  natural  to  define  the  interpretation  vector,  I,  as  a  MRF  defined  on  the 
associated  adjacency  graph. 

Under  the  assumption  that  I  is  then  a  conditional  MRF,  it’s  well  known  through 
the  equivalence  of  MRF’s  with  Gibbs  random  fields  (GRF’s),  that  the  the  conditional 
probability  must  be  of  the  form 

-uii.a.K.Z) 

p{l\Z,K,X}  = - ^ - ,  (2) 

where  U{J',Z,K,X)  is  the  associated  Gibbs  energy  function  and  Z  is  the  corresponding 
partition  function  which  serves  the  role  of  a  normalization  constant.  More  specifically,  we 
have 
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wh('r('  the  surnniation  is  owr  all  legitimate  interpretation  vectors.  The  energy  function 
must  then  be  designed  to  take  into  accotint  the  information  represented  by  R.,K  and  X. 

.\s  can  be  seen  from  (1)  and  (2),  the  MAP  estimate  is  obtained  by  minimizing  the 
energy  function,  d'his  is  a  difficult  combinatorial  problem  since,  as  we  have  noted  previ¬ 
ously,  there  are  as  many  as  {K  -  l)'*^  possible  interpretation  vectors,  I.  In  ;l'  we  proposed 
and  described  the  use  of  a  stochastic  relaxation  procedure,  called  simulated  annealing,  to 
overcome  these  combinatorial  problems.  More  specifically,  simulated  annealing  was  used 
to  obtain  the  maximum  of  p{I^.C,K,X}. 

.Now  consider  the  choice  of  a  Gibbs  energy  function.  It’s  well-known  (cf.  I8i)  that  this 
must  be  ^f  the  form 

(4) 

where  K.fl,.;  X",  Z)  is  called  a  digue  function  and  the  summation  in  (4)  is  over  all 

possible  cliques  with  1^  the  restriction  of  I  to  the  clique  c.  Cliques  and  clique  functions 
are  described  in  more  detail  in  [ij.  In  particular,  we  showed  that  the  summation  in  [4]  can 
be  rewritten  as 

N 

T'(I;>e,K,X)  (5) 

i=  1  rrC, 

Here,  the  outer  sum  is  over  the  individual  nodes  while  the  inner  sum  is  over  the  set  of 
distinct  cliques,  C,,  associated  with  i  =  1,2^ . . .  ^  N . 

As  pointed  out  in  !l|,  the  outstanding  problem  at  this  point  then  is  in  the  determina¬ 
tion  and  specification  of  an  appropriate  set  of  clique  functions.  At  that  time  we  suggested 
some  ways  in  which  these  clique  functions  could  be  cho.sen  in  some  simple  illustrative  prob¬ 
lems.  During  the  past  year  we  have  studied  several  refinements  in  the  selection  of  clique 
functions  and  have  applied  this  scheme  to  several  sets  of  synthetic,  as  well  as  real-world 
images,  ('"'ur  work  here  is  incomplete  and  we  expect  to  actively  pursue  these  investigations 
throughout  FY’88. 


4.1.5 


In  the  following  Sections  of  this  report  we  will  describe  in  some  detail  the  rather  ex¬ 
tensive  work  we  have  completed  in  FY’87  concerned  with  segmentation  techniques  Again 
it  must  be  emphasized  that,  regardless  of  the  image  interpretation  technique  employed, 
the  results  are  highly  dependent  on  having  a  good  initial  segmentation. 

References  for  Section  4.1 

1.  .],  \V.  .Modestino.  "A  Hierarchial  Region-Based  Approach  to  Automated  Photoinlcr- 
pretation",  NAIC  Final  Report  for  FY’86. 

2.  H.  L.  \'an  Trees,  Detection,  Estimation  and  Modulation  Theory  I,  Wiley  and  Sons, 
.New  York,  1968. 

3.  R.  Kinderman  and  J.  L.  Snell,  Markov  Random  Fields  and  Their  Applications,  Amer¬ 
ican  Mathematical  Society,  Providence,  RI,  1980. 
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Fig.  4.1-1  Automated  Photointerpretation  Testbed 


Automated  Photointerpretation  Testbed. 
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P'i".  -1.1-2  An  Initial  Sognunitalion  of  an  Image 
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4.2  Unsupervised  Image  Segmentation  Using  A  Gaussian  Model: 

A  Gaussicin  random  field  model-based  majcimum-likelihood  (ML)  approach  to  image 
segmentation  is  described  in  this  section.  In  this  approach,  the  segmentation  problem  is 
formulated  as  a  statistical  decision  problem  under  a  Gaussian  modeling  assumption  for 
different  image  classes.  The  model  parameters  are  estimated  directly  from  the  observed 
image,  resulting  in  an  unsupervised  algorithm.  The  results  of  applying  this  algorithm  to 
the  segmentation  of  aerial  images  are  also  described. 

4.2.1  Background: 

Image  segmentation  is  a  very  important  problem  in  many  image  processing  applica¬ 
tions.  In  an  image  segmentation  problem,  an  observed  image  is  separated  into  regions 
of  different  properties.  Two  of  the  most  important  properties  used  are  tone  and  texture 
T].  Tone  is  related  to  the  average  gray  level  of  a  region  while  texture  corresponds  to  the 
spatial  distribution  of  different  gray  levels  in  a  region.  As  pointed  out  in  [l],  different 
regions  in  an  image  sometimes  exhibit  mainly  one  or  the  other  of  these  two  properties. 
When  the  spatial  variation  of  gray  levels  in  a  region  is  small  and  uncorrelated,  the  region 
is  dominated  by  tone.  On  the  other  hand,  if  the  spatial  variation  of  gray  levels  is  large  or 
correlated,  the  region  is  dominated  by  texture.  This  domination  is  not  only  determined  by 
the  particular  image  scene,  but  more  often  by  the  resolution  of  the  image.  In  this  paper, 
we  are  mainly  concerned  with  images  whose  regions  are  dominated  by  tonal  properties. 
Surveys  of  texture  segmentation  techniques  can  be  found  in  [l|,  [12],  [14|.  Examples  of 
the  type  of  image  for  which  tonal  properties  dominate  can  be  found  in  many  aerial  pho¬ 
tographs.  In  these  images,  the  regions  correspond  to  roads  and  fields,  which  have  little 
texture  originally,  or  vegetation  regions  which  show  little  texture  or  gray  level  variation 
because  of  the  low  resolution  of  the  image. 

In  many  image  analysis  applications,  image  segmentation  is  the  first  stage  of  process¬ 
ing  and  the  quality  of  segmentation  is  crucial  to  the  overall  performance  of  the  system 
[2].  This  is  particularly  the  case  in  our  application  which  is  in  automated  photointerpre¬ 
tation  [13|.  Because  of  its  importance  in  a  wide  variety  of  applications,  a  large  number 
of  image  segmentation  techniques  have  been  proposed.  These  techniques  can  be  classified 
into  two  different  approaches;  a  statistical  approach,  where  tonal  or  textural  properties 
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are  characterized  in  statistical  terms,  such  as  mean,  variance,  correlation  functions  and 
probability  distribution  functions  and  a  structural  approach,  where  these  image  properties 
are  described  by  a  properly  defined  formal  language  [l].  In  this  paper,  we  axe  interested 
in  a  purely  statistical  approach  for  image  segmentation  where  the  image  regions  exhibit 
mainly  tonal  properties. 

Most  of  the  previously  proposed  statistical  techniques  are  heuristic  or  ad  hoc  in  that 
they  are  either  based  on  some  ad  hoc  arguments  or  derived  from  certain  heuristics  about 
a  specific  set  of  images.  The  work  of  Haralick  and  Shapiro  [3]  provides  a  comprehensive 
survey  of  most  of  the  existing  heuristic  statistical  image  segmentation  techniques,  rang¬ 
ing  from  the  reasonably  simple  to  the  very  complex.  Although  considerable  success  has 
been  achieved  by  a  number  of  them  in  some  specific  and  well-defined  situations,  they  have 
some  unsatisfactory  features.  For  example,  it’s  often  difficult  to  precisely  define  or  choose 
the  parameters  involved  in  these  algorithms,  such  as  the  valleys  of  histograms  in  Vtiri- 
ous  histogram-guided  thresholding  techniques  or  thresholds  for  closeness  in  most  region 
growing  algorithms.  Many  more  sophisticated  algorithms  require  an  enormous  amount 
of  computation.  In  addition,  there  is  little  known,  in  general,  on  how  effective  these  al¬ 
gorithms  are  and  what  type  of  images  they  can  be  applied  to.  More  specifically,  there 
is  no  specific  modeling  cissumption  made  for  the  image  properties  and,  consequently,  the 
resulting  solution  cannot  be  optimal. 

To  overcome  these  difficulties,  a  number  of  stochastic  model-based  image  segmenta¬ 
tion  techniques  have  been  proposed  (4j-[8],  [l4j.  In  a  statistical  model-based  approach, 
stochastic  modeling  assumptions  are  made  for  regions  of  different  statistical  properties,  we 
call  classes,  in  an  image.  Then  the  segmentation  problem  is  formulated  as  a  statistical 
decision  problem  and  an  optimal  solution  is  sought.  As  a  result,  the  stochastic  model- 
bzised  approach  usually  provides  image  segmentation  techniques  that  are  more  generally 
applicable  and  optimal  according  to  some  well-defined  criterion. 

Most  of  the  stochastic  model-baised  techniques,  however,  exploit  textural  properties 
rather  than  tonal  properties;  hence  these  are  texture  segmentation  techniques.  One  of  the 
few  techniques  which  mainly  makes  use  of  tonal  properties  or,  more  precisely,  attempts  to 
model  tonal  properties,  is  an  algorithm  proposed  by  Derin  and  Elliot  [5].  In  this  technique. 
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different  image  regions  are  modeled  by  a  constant  gray  level  with  additive  white  Gaussian 
noise  which  has  the  same  mean  and  variance  over  the  entire  image  while  the  distribu¬ 
tion  of  different  regions  is  modeled  by  a  Markov  random  field  (MRF),  or  Gibbs  random 
field  (GRF).  The  segmentation  problem  is  then  formulated  as  a  maximum  a  posteriori 
(MAP)  estimation  problem.  The  maximization  of  the  a  posteriori  probability  functional  is 
performed  using  ai  approximate  dynamic  programming  procedure.  This  algorithm  is  par¬ 
tially  unstipervisel  in  that  the  model  parameters  for  the  regions  are  estimated  directly  from 
the  observed  image  by  the  moment  method  of  Gaussian  mixture  estimation  although  the 
model  parameters  for  the  MRF  model  which  generates  the  regions  must  be  pre-specified. 
The  choice  of  underlying  MRF  parameters  is  made  heuristically.  While  some  successful 
examples  are  shown  in  [5],  this  algorithm  is  computationally  quite  involved.  Both  dynamic 
programming  and  the  mixture  estimation  procedure  require  considerable  computation.  In 
their  approach,  the  image  classes  are  modeled  as  having  constant  gray  levels  corrupted  by 
additive  observation  noise.  This  is  a  rather  unrealistic  assumption  since  many  image  re¬ 
gions  that  appear  to  have  uniform  gray-levels  have  gray  level  variation  in  them  in  addition 
to  the  additive  observation  noise.  Finally,  it’s  not  very  clear  how  the  model  parameters  for 
the  MRF  of  the  region  distribution  should  be  selected.  Recently,  it  has  been  shown  that 
the  parameters  for  the  MRF  can  be  estimated  through  an  EM  (Expectation-  M2iximiza- 
tion)  type  algorithm  [12].  A  disadvantage  is  that  the  amount  of  computation  required  is 
quite  large. 

In  this  paper,  we  describe  a  novel  stochastic  model-based  image  segmentation  ap¬ 
proach  which  provides  a  simpler  alternative  and  overcomes  some  of  the  unsatisfactory 
features  of  Derin  and  Elliot’s  technique.  First  of  all,  we  model  different  image  classes,  or 
region  types,  as  independent  Gaussian  random  fields  with  different  spatially  constant  mean 
and  variances.  The  constant  mean  of  a  cleiss  is  used  to  model  the  flat  gray  level,  or  tone,  of 
the  region  and  the  class-dependent  variance  is  used  to  model  the  combined  effects  of  varia¬ 
tion  of  gray  levels  and  additive  observation  noise  which  is  assumed  to  be  zero  mean  for  that 
class.  Assuming  the  variation  of  gray  level  in  a  region  is  relatively  small,  our  model  is  a 
tonal  model.  Unlike  Derin  and  Elliot’s  algorithm,  we  do  not  make  any  assumptions  on  the 
distribution  of  different  regions  in  the  image,  since  it  is  quite  involved  to  estimate  the  MRF 
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model  parameters  and  perform  the  MAP  operation.  This  results  in  a  maximum-likelihood 
(ML)  approach.  By  using  the  independence  assumption,  the  likelihood  functional  can  be 
maximized  through  a  highly  parallel  operation;  even  using  a  raster  scan,  this  can  be  quite 
‘jimply  done  in  one  scan.  Finally,  the  model  parameters  for  different  image  classes  can  be 
estimated  by  using  a  computationally  efficient  clustering  technique  operating  directly  on 
the  observed  image.  Hence  this  approach  is  entirely  unsupervised.  This  algorithm  has  been 
applied  to  a  set  of  aerial  photographs  and  the  results  are  shown  to  be  quite  promising. 

•1.2.2  The  Gaussian  Model  and  ML  Segmentation: 

In  this  paper,  we  consider  an  image  as  an  array  of  gray  levels  defined  on  a  two- 
dimensional  (2-D)  lattice  of  finite  extent.  In  particular,  we  denote  an  image  by  x  where 

X  =  {2:(m,  n) ,  (m,  n)eL};  L  =  {(m,n),  1  <  m,n  <  N}.  (1) 

A  random  field  is  a  family  of  random  variables  defined  over  the  lattice  L.  In  this  paper, 
we  use  capital  letters  for  random  fields  and  random  variables,  lower-ccise  letters  for  real¬ 
izations  of  random  fields  and  sample  values  of  random  variables.  A  Gaussian  random  field 
representing  an  observed  image  can  then  be  defined  as 

X{m,n)  =  f{m,n)  +W{m,n)-,  {m,n)eL,  (2) 

where  f{m,n)  is  the  mean  and  W{m,n)  is  a  zero-mean  Gaussian  random  sequence,  i.e., 
W[m,n)  ~  N{0,a'^{m,n)).  In  particular,  we  eissume  f{m,n)  and  o^[m,n)  are  constant, 
but  unknown,  for  an  image  class  and  vary  for  different  classes.  In  addition,  we  assume  that 
the  X(m,n)’s  are  independent.  The  probability  density  function  of  the  observed  random 
field  is  then  simply 

n  2a2(m,n) 

Under  a  stochastic  modeling  assumption,  the  image  segmentation  problem  can  be 
formulated  as  a  statistical  decision  problem.  Here  we  take  the  basic  formulation  of  the 
segmentation  problem  as  in  [4],  [7],  [8].  Assume  that  there  are  K  possible  image  classes 
associated  wiih  the  K  hypotheses,  —  1,2,...,/^.  Suppose  that  they  are  distributed 
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in  disjoint  regions  as  shown  in  Fig.  4.2-1.  Each  of  the  image  classes  is  modeled  by  an 
independent  Gaussitin  model  corresponding  to  a  particular  hypothesis.  That  is,  we  have 
the  K  hypothesis  classes 


Hk  :  X(m,n)  =  f{k)+W(‘^\m,n)-  =  1, 2, . . . ,  X,  (4) 

where 

lV<'‘>(m,n)  ~  iV(0,a2(A:)).  (5) 

A  typical  realization  of  the  K-class  Gaussian  random  field  image  is  shown  in  Fig  4.2-2., 
with  K  =  3.  Here,  the  regions  are  first  generated  by  a  2-D  MRF  and  then  “colored”  by 
the  appropriate  Gaussian  model.  The  model  parameter  vectors  a^,  1,2,3,  are  described 

in  the  next  section. 

In  essence,  image  segmentation  is  the  process  of  assigning  each  pixel  in  the  image 
to  a  correct  hypothesis  class.  According  to  statistical  decision  theory,  an  assignment  rule 
which  minimizes  the  classification  error,  assuming  equally  likely  hypothesis,  is  a  threshold 
test  based  on  the  ratios  of  the  class-conditional  likelihood  functionals,  or  some  monotone 
function  of  it  [9].  More  specifically,  for  each  point  (t,;)  in  the  lattice  L,  we  can  construct 
a  window  of  size  (2M+1)  x  (2A/-I-1),  centered  at  (:,y)  and  denoted  by  Wi  j.  The  data 
contained  in  the  window  is  denoted  by  Xi  j.  That  is,  Xij  —  {i(m, n),  (m, } 
where 


iVj.j  =  {(m,n),t  -  M<m<i  +  M,j-M<n<j-^  M},  (6) 

with  M  <<  N  and  boundary  effects  are  ignored.  Define  the  class-conditional  log  likelihood 
functional,  given  Hk,  at  (x,j)  by 

L,(X,,,)  =  log{p(X..,(/fO}-  (7) 

Then  a  maximum-likelihood  approach  is  to  assign  pixel  position  (t,j)  to  image  class  ko  if 

kn  =  arg  max  Lfc(X,,j).  (8) 

l<k<K 
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Notice  iiore  if  wc  let  M  ~  0,  the  window  will  contain  only  a  single  pixel  which  is  a  ML 
estimation  approach  under  an  independence  assumption  on  the  pixels  [7].  As  will  be  shown 
later,  the  segmentation  result  with  M  =  0  is  somewhat  “spotty”,  and  a  proper  choice  of 
\f  >  0  can  smooth  out  mmst  of  the  noise  spots.  Notice  also  that  in  this  segmentation 
algorithm,  the  decision  on  any  pixel  position  is  independent  of  those  of  the  others,  hence 
it  "'an  he  implemented  in  parallel.  However,  in  our  implementation  we  utilize  a  raster  scan 
processing  approach  wtiich  can  be  summarized  cis  follows: 

1. ]  Process  all  the  pixels  in  a  rcLster  scan  order. 

2. )  .\t  each  pixel  position,  a  decision  window  centered  at  the  pixel  is  constructed 

(ignoring  ttie  boundary  effects). 

3. )  The  class-conditional  Lxelihood  functional  defined  in  expression  (7)  can  then  be 

evaluated  for  each  hypothesis. 

4. )  .\ssign  the  pixel  to  image  claiss  A:o,  1  <  ko  <  K  ,  it  maximizes  the  class- 

conditional  likelihood  functional  as  in  expression  (8). 

To  carry  out  the  computations  in  3.)  and  4.)  above,  the  model  parameters  for  each  of  the 
image  classes  are  needed.  In  the  next  section,  we  will  describe  a  method  for  estimating 
the  model  parameters  directly  from  the  image. 

4.2.3  Model  Parameter  Estimation  and  Segmentation  Results: 

The  parameter  estimation  technique  used  in  the  ML  segmentation  approach  is  similar 
to  those  in  our  previous  work  [7], [8],  which  were  quite  successful  in  unsupervised  texture 
segmentation.  More  specifically,  define  the  model  vector  for  each  class  or  hypothesis, 

=  {f{k),a{k)),  k  =  l,2,...,K.  (9) 

Then  the  a^’s  are  the  model  parameters  to  be  estimated  from  the  observed  image.  As  in  [7] 
and  I8j,  consider  a  sliding  window  of  size  Mi  x  TVj,  where  Mi  <<  A^,  A^i  <<  A^,  with  each 
step  of  the  sliding  window  being  displaced  M2  pixels  vertically  and  N2  pixels  horizontally, 
as  shown  in  Fig.  4.2-3.  At  each  position  of  the  sliding  window,  a  Gaussian  model  vector  is 
estimated  by  computing  the  sample  mean  and  sample  variance.  This  vector  is  then  stored 
as  a  sample  vector.  Finally,  all  the  sample  vectors  obtained  this  way  are  then  used  as  input 
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to  a  particular  clustering  algorithm  known  as  the  it'-means  algorithm  (lOj.  The  centroids 
of  the  clusters  found  in  the  clustering  process  are  then  used  as  model  parameter  vectors 
for  the  underlying  image  classes  and  used  in  the  model-based  windowed  ML  segmentation 
algorithm  described  in  the  previous  section. 

A  remaining  question  with  this  estimation  approach  is  how  K,  the  number  of  dif¬ 
ferent  image  classes,  is  to  be  determined.  In  related  work  [ll],  we  have  proposed  use  of 
an  information-theoretic  criterion,  known  as  the  Akaike  Information  Criterion  (AlC),  to 
determine  the  number  of  classes  from  the  observed  image.  This  scheme  has  been  shown 
to  provide  correct  results  for  synthetic  mixture  data  and  reasonable  results  for  real-world 
images  that  are  in  close  agreement  with  subjective  observations.  This  scheme  is  directly 
applicable  to  the  present  situation.  In  this  paper,  however,  our  interest  is  to  see  how 
effective  the  segmentation  is  under  reasonable  assumptions  on  the  number  of  classes.  By 
reasonable,  we  mean  the  number  of  classes  is  approximately  equal  to  the  number  of  per¬ 
ceptively  different  tone  classes  in  the  image.  In  the  segmentation  experiments  to  follow 
then,  we  assign  the  number  of  clcisses  by  observing  the  images. 

There  are  two  other  problems  encountered  when  implementing  the  estimation  algo¬ 
rithm.  The  first  is  how  the  sliding  window  size  should  be  selected  for  model  parameter 
estimation.  Although  it  is  not  clear  quantitatively  how  the  window  size  effects  the  esti¬ 
mation  accuracy,  we  can  make  some  qualitative  observations.  In  general,  if  the  window 
is  too  large,  it  might  contain  a  significant  amount  of  data  from  different  classes,  resulting 
in  unreliable  estimates.  On  the  other  hand,  if  the  window  is  too  small,  the  data  might 
not  be  enough  to  arrive  at  reasonably  accurate  estimates.  At  this  point,  we  choose  the 
sliding  window  size  heuristically.  For  example,  we  noticed  in  our  experiments  that  most  of 
the  regions  have  a  size  greater  than  16x16.  As  a  result,  we  choose  the  size  of  the  sliding 
window  to  be  16x16.  Notice,  however,  as  long  as  the  window  is  not  too  large  or  too  small, 
the  size  is  not  very  critical  and  the  same  size  can  be  used  for  a  number  of  images. 

Secondly,  even  by  proper  selection  of  the  window  size,  we  still  might  come  to  a  situa¬ 
tion  in  which  a  window  contains  data  from  different  classes  in  about  equal  amounts,  i.e., 
the  window  is  “sitting”  on  a  boundary.  The  sample  vectors  arising  from  such  situations  will 
affect  the  accuracy  of  the  estimated  class  model  vectors.  As  a  result,  the  performance  of 
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the  se^KU'nlat  in'  For  oxaniple,  regions  that  should  be  well  separated  if 

the  class  itualel  v<uOor.i  arc'  reascu'e.bly  accurate  may  be  mixed  together,  or  not  separated  at 
all.  However,  wc  obse;  v-'d  that  when  a  sliding  window  contains  data  from  different  cla.sses, 
the  <‘st iiruit c'l  variance  is  u.siially  quite  large,  especially  when  the  difference  in  gray  level 
IS  la.i-ge.  Hence,  to  iinnrove  the  estimation  accuracy,  we  can  reject  those  sa.niple  vectors 
which  have  large  variance  '  ornponents.  For  the  simple  scheme  considered  here,  a  threshold, 
denoted  T,y,  is  selected  and  if  th.c  square  root  of  the  variance  component  of  a  sample  vector 
exceeds  the  threshold.  Vb,  it  will  be  discarded  from  the  clustering  procedure.  Currently, 
this  threshold  is  selected  heuristically  through  observing  the  quality  of  corresponding  seg- 
men^F.tion  results.  L.iter  in  this  section,  we  will  show,  through  experimental  results,  this 
simple  scheme  does  improve  the  segmentation.  For  a  completely  automatic  process,  it  has 
to  be  selected  according  to  a  fixed  rule  or  algorithm.  Other  more  sophisticated  techniques 
can  also  be  used  to  obtain  reliable  model  parameter  estimates.  F’or  example,  a  f-YP^ 
of  test  can  be  performed  on  the  data  contained  in  a  number  of  subwindows  of  a  sliding 
window  to  see  if  they  have  the  same  distribution;  that  is,  if  the  gray  level  in  the  sliding 
window  is  ‘‘uniform’’.  If  the  data  is  uniform,  an  estimated  sample  model  vector  is  stored. 
Otherwise,  it  is  rejected.  In  this  approach,  we  still  need  to  decide  the  size  of  the  sliding 
window,  the  number  of  subwindows,  and  the  significance  level  of  the  test.  Another  ap¬ 
proach  is  to  use  robust  estimation  techniques  treating  unreliable  sample  model  vectors  as 
“outliers”.  These  approaches  are  currently  under  investigation  [l2j. 

We  have  applied  the  algorithm  described  in  the  previous  sections  to  the  segmentation 
of  aerial  photographs.  The  images  are  of  size  256x256  and  digitized  to  256  gray-levels. 
The  segmentation  is  performed  for  each  image  under  different  assumptions  on  the  number 
of  classes.  In  the  following  we  will  present  and  discuss  the  experimental  results. 

First,  we  show  that  by  rejecting  sample  model  vectors  with  large  estimated  variance 
component,  using  the  simple  scheme  described  previously,  the  segmentation  results  can 
indeed  be  improved  In  Fig.  4.2-4,  an  image  containing  fields  and  oil  tanks  is  segmented 
under  the  assumption  of  3  classes.  The  size  of  the  decision  window  is  3  x  3;  that  is,  M  1. 
The  segmentation  results,  along  with  the  estimated  model  parameter  vectors,  are  shown 
in  Fig.4.2-4b  and  4.2-4c,  respectively,  for  the  ca.se  of  not  rejecting  any  sample  vector  and 
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rejecting  sample  vectors  with  large  variance  component.  In  the  latter  Ccise  we  have  taken 
T„=^15.  It  can  be  seen  that  the  results  are  improved  considerably.  In  the  rest  of  the 
experimental  results,  we  reject  the  sample  vectors  which  have  large  variance  component 
using  T„  =  15. 

Next,  in  Fig.  4.2-5,  we  show  the  effect  of  the  window  size  M  in  the  ML  segmentation 
approach.  It  can  be  seen  that  when  the  decision  window  size  is  selected  properly,  the 
windowed  approach  smoothes  out  some  noisy  spots  in  the  segmentation  and  significantly 
improves  the  segmentation  results.  In  the  rest  of  the  segmentation  experiments,  we  used 
decision  windows  with  size  3  x  3,  or  M  =  1. 

Finally,  in  Fig.’s  4.2-6,  4.2-7,  4.2-8  we  show  some  segmentation  results  for  three  differ¬ 
ent  aerial  photographs  under  the  assumptions  of  both  3  and  6  classes.  In  each  case,  different 
regions  of  the  image  are  separated  reasonably  well  by  a  3-class  assumption.  Buildings,  roof, 
roads,  and  vegetation  areas,  are  well  separated.  Finer  segmentation  is  obtained  under  a 
6-class  assumption.  It  should  be  pointed  out,  however,  that  the  segmentations  here  are 
still  coarse  in  that  different  real  world  objects  are  assigned  to  the  same  class  as  long  as 
they  are  close  in  tonal  properties.  Differentiation  of  regions  of  the  same  class  which  are 
really  different  objects  could  be  achieved  using  other  properties,  for  example,  texture  or 
shape  information. 

4.2.4  Summary: 

In  this  paper,  we  have  described  an  unsupervised  Gaussian  model-based  ML  approach 
to  image  segmentation.  In  this  approach,  different  regions  are  modeled  by  independent 
and  spatially  varying  Gaussian  random  fields.  The  segmentation  problem  is  formulated  as 
a  statistical  decision  problem  and  an  ML  solution  is  proposed.  The  model  parameters  arc 
estimated  directly  using  a  clustering-estimation  method.  Experiments  on  the  segmentation 
of  aerial  photograph  images  are  shown  to  be  promising. 

This  work  brings  up  a  number  of  problems  for  future  investigation.  First,  we  need 
to  study  methods  to  determine  the  threshold,  T^,  for  rejecting  erroneous  sample  model 
vectors  directly  from  the  data.  Possible  solutions  are  outlined  in  the  previous  section 
and  experiments  are  needed  to  thoroughly  investigate  their  efficacy.  Another  interesting 
problem  is  the  chara^-ierization  of  the  image  clzisses.  The  independent  Gaussian  mode! 
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o!!i;!loy"(i  in  *his  work  basically  aimed  at  the  tonal  properties  of  the  image  classes,  while 
spatial  variation  c'r  'exture  is  only  reflected  in  the  variance  of  the  model.  In  addition, 
:  lie  independence  assumption  further  limits  the  characterization  of  texture  properties  in 
in, a  O'  clas^e^;  (')n  trie  other  hand,  a  number  of  texture-based  segmentation  schemes  do  not 
r)erform  wed  when  the  image  classes  exhibit  strong  tonal  differences  [12],  What  is  needed 
i.--  a  mire  r(.bu^r,  approach  that  combines  the  merits  of  both  tonal  model-based  and  texture 
mo  d  e  1-  b  asod  a  [)  n  roac  h  es . 
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Fig.  4.2-1  An  Image  Containing  Multiple  Regions 


An  Initial  Segmentation  of  an  Image. 
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a.)  MRF  Generated  Region  Map. 


b.)  3“Class  Image; 
ai=(70,8.9). 
a2=(100.l4.1). 
a3=(150,  10.9). 


Fig.  4.2-3  A  Sliding  Window  on  the  Image  Plane 


Fig.  4.2-4  Performance  Improvement  by  rejecting  large  variance  components 


b.)  Segmentation  1 , 3-classes, 

without  rejecting  model  vectors 
with  large  variance  term;  M=1, 
ai-(150,  57), 
a2=(203.4,  7.5), 
a3=(135.2, 9.0) 


c.)  Segmentation  2,  3-classes, 
rejecting  model  vectors  with 
large  variance  term;  T^=15,M=1, 
ai=(158.1.  11.5), 
a^2=(208.7,3.4), 
a3=(131.0,  5.0) 


Fig.  4.2-5  Improvement  by  decision  window  of  size  greater  than  one,  T( 


b.)  3-Class  Segmentation 


c.)  6-Class  Segmentation 
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Fig.  4.2-7  Segmentation  of  Aerial  Photo  2,  T„  =  15,M  —  1 


a.)  Original  Image 


b.)  3-Class  Segmentation  c.)  6-Class  Segmentation 


4.3  Te.xture  Classification  and  Discrimination  Using  the  Markov  Random  Fieki  Model: 

Over  the  hist  ten  years,  texture  analysis  has  become  a  very  important  area  in  image 
processing  applications  and  many  techniques  have  been  proposed  and  investigated.  These 
techniques  can  be  classified  as  either  statistical  or  structural.  In  a  statistical  approach, 
texture  is  characterized  in  terms  of  its  statistical  properties  such  as  mean,  variance,  or 
probability  distribution.  In  a  structural  approach  [l-3j,  texture  is  described  as  a  formal 
language  which  contains  specified  primitives  as  elements  and  uses  a  placement  rule  as  its 
grammar.  In  this  paper,  we  will  be  only  interested  in  a  purely  statistical  approach. 

Most  of  the  existing  statistical  techniques,  as  summarized  by  Haralick  in  T',  are  ad 
hoc  in  that  no  stochastic  modeling  assumptions  are  made  for  the  texture  classes.  Textures 
are  described  in  terms  of  some  lower-level  features  such  as  mean,  variance  and  correla¬ 
tion  functions.  Although  these  features  provide  some  useful  information  about  the  texture 
classes,  they  are  quite  limited.  As  a  result,  while  considerable  success  has  been  achieved 
for  some  special  applications,  there  is  little  known  in  general  as  to  how  good  these  tech¬ 
niques  are  and  what  type  of  texture  classes  they  can  be  applied  to.  In  response  to  this 
shortcoming,  Modestino,  et  al.  [4-5j  introduced  a  particular  random  field  model,  called  the 
random  tessellation  process,  for  texture.  Under  this  modeling  assumption,  texture  anal¬ 
ysis  applications,  such  as  classification  axid  discrimination,  can  be  formulated  as  classical 
statistical  decision  problems.  More  generally  applicable  and  optimal  solutions  can  then  be 
obtained.  However,  as  pointed  out  in  [4-5],  there  are  some  unsatisfactory  features  of  their 
model. 

The  recent  developments  in  Markov  random  field  (MRF)  theory  provide  a  powerful 
alternative  texture  model  and  have  resulted  in  intensive  research  activity  in  MRF  model- 
bcLsed  texture  analysis  techniques  [6-9],  Comparing  to  the  previously  proposed  techniques, 
the  MRF  model-based  approach  heu;  several  distinguishing  features. 

First  of  all,  the  MRF,  also  known  as  the  Gibbs  Random  F'ield  (GRF).  is  characterized 
by  the  joint  probability  distribution  function  of  the  random  variables  on  the  entire  lattice 
over  which  the  MRF  is  defined.  This  provides  complete  information  about  the  statistical 
properties  of  the  random  field.  Secondly,  the  joint  probability  of  the  random  field  can  be 
specified  in  terras  of  a  few  parameters,  which  makes  the  model  mathematically  tractable. 
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Finally,  synthetic  textures  that  closely  resemble  real-world  textures  can  be  generated  by 
properly  selecting  a  specific  MRF  model.  In  this  paper,  we  describe  a  novel  MRF  model- 
based  maximum-likelihood  (ML)  approach  to  texture  classification  and  discrimination. 

In  a  texture  classification  problem,  an  observed  image  is  to  be  assigned  to  one  of  a 
finite  number  of  clas.ses  according  to  its  texture.  Abend,  et  al.  [10]  proposed  a  Markov 
mesh  model-based  approach  for  texture  classification.  As  an  extension  of  the  Markov 
chain  to  two  dimensions,  it  has  a  causal  structure  and  recently  has  been  shown  [ll]  to  be 
a  subclass  of  the  i.lRF  which  is  non-causal  in  general.  Chellappa,  et  al.  [12]  have  shown 
some  success  on  texture  cleisslfication  using  a  non-causal  Gaussian  Markov  Random  Field 
model  which  is  again  a  specific  MRF  model  'Hi.  A  similar  approach  is  proposed  in  [13|.  In 
this  paper,  we  will  consider  the  general  MRF  model  which  is  noncausal  and  non-Gaussiaji. 
As  can  be  seen  later,  this  class  of  MRF  models  is  more  convenient  for  the  classification  of 
textures  with  few  gray  levels,  as  with  binary  textures,  for  example. 

In  a  texture  discrimination  problem,  an  observed  image  is  to  be  separated  into  disjoint 
regions  of  different  textures,  hence  is  also  known  as  texture  image  segmentation.  Derin, 
et  al.  [8j  proposed  a  maximum  a  posteriori  (MAP)  estimation  approach  for  texture  dis¬ 
crimination  using  the  MRF  model.  More  specifically,  they  have  considered  a  hierarchical 
image  model.  First,  the  distribution  of  the  texture  regions  on  the  image  lattice  is  modeled 
as  a  MRF.  Then,  different  texture  types  are  modeled  by  different  MRF  models.  The  max¬ 
imization  in  this  MAP  approach  is  performed  through  dynamic  programming  with  some 
approximations  made  on  the  a  posteriori  probability  functional.  The  model  parameters 
for  different  textures  are  assumed  to  be  estimated  from  training  data  while  the  model 
parameters  for  the  distribution  of  regions  are  chosen  heuristically.  In  other  words,  this  is 
a  supervised  approach.  A  similar  supervised  MAP  approach  is  developed  in  [13]  under  a 
Gaussian  Markov  modeling  assumption. 

In  this  paper,  we  use  a  novel  unsupervised  ML  approach.  First  of  all,  we  do  not  assume 
a  model  for  the  distribution  of  regions  since,  even  in  those  cases  for  which  training  data 
for  different  textures  are  available,  the  training  data  for  the  region  distribution  is  rarely 
available.  It  is  for  this  reeison,  in  particular,  that  we  have  avoided  a  MAP  approach  and 
made  use  of  a  simpler  ML  approach  which  requires  less  a  priori  knowledge.  Secondly,  we 
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have  considered  the  case  when  the  training  data  for  each  texture  type  is  not  available. 
A  clustering  technique  is  used  to  estimate  the  MRF  model  parameters  directly  from  the 
observed  image,  resulting  in  an  unsupervtsed  scheme.  Finally,  texture  discrimination  is 
accomplished  by  assigning  each  pixel  of  the  image  into  different  texture  classes  through  aii 
■ML  test  performed  on  the  basis  of  neighboring  pixels.  This  results  in  a  highly  parallel  al¬ 
gorithm.  As  will  be  shown,  compared  to  a  dynamic  programming  approach,  this  algorithm 
requires  far  less  computation. 

After  a  brief  review  of  the  MRF  theory  in  the  next  section,  the  ML  approach  for 
texture  classification  and  discrimination  with  corresponding  experimental  results  will  be 
presented  in  Section  4.3.2  and  Section  4.3.3,  respectively.  A  summary  is  provided  in  Section 
4.3.4. 

4.3.1  The  Markov  Random  Field  Model: 

The  MRF  model  used  in  this  paper  originated  from  studies  in  statistical  physics  and 
recently  has  been  adapted  to  an  image  processing  context.  In  this  section,  we  review  some 
basic  theory  and  some  specific  MRF  models  that  will  be  used  later. 

For  simplicity,  we  consider  only  digital  images.  That  is,  images  with  finite  size  and  a 
finite  number  of  grey  levels.  In  particular,  we  define  an  image  to  be  a  two-dimensional  (2- 
D)  array  over  a  finite  square  lattice,  denoted  by  f  =  {/(i,  j),  (i,  j)eL}  where  L  =  {(»,  j),  1  < 
^  ^  ^  J  /(»,;)  can  assume  only  a  finite  number  of  values. 

A  random  field  is  defined  to  be  a  family  of  random  variables  defined  over  the  2- 
D  lattice  L.  Denote  the  random  field  by  X,  the  random  variable  at  {i,j)  by  X(t,j),  then 
X  -  {X(i,  j),  (i, jjfL}.  In  statistical  image  modeling,  images  are  considered  as  realizations 
of  landom  fields.  In  this  paper,  capital  letters  are  used  for  random  fields  or  random 
variables  while  lowercase  letters  are  used  for  realizations  or  sample  values. 

A  MRF  on  a  2-D  lattice  is  a  rajidom  field  with  the  special  property  that  the  statistics 
of  a  point  in  the  lattice  given  those  of  the  rest  of  the  lattice  depends  only  on  a  few  points 
known  as  its  neighbors.  More  rigorous  definitions  are  presented  in  what  follows,  s.:j-ting 
with  the  concept  of  a  neighborhood  system. 

Definition  1:  A  collection  of  subsets  of  L,  n  —  (u(i,  j),  (i,  j)tL,  n(i,  j)  C  L}  is  a  neighbor¬ 
hood  system  on  L,  if  and  only  if 
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(;i  (i,jl  t  n[i,j]. 

(ii)  if  (fc, .  then  (i.  j  £). 

Some  typical  neighborhood  svstem  configurations  are  shown  in  Fig.  4.3-1.  As  in¬ 
dicated  chere,  a  neigliborhood  system  can  be  classified  as  first-order,  second-order,  etc., 
according  ro  ihe  number  of  neignbors  each  lattice  point  has.  To  avoid  boundary  prob¬ 
lems,  a  periodic  lattice  structure  is  assumed.  Under  this  condition,  all  the  points  in  L  will 
have  the  same  number  of  neighbors.  .A  MRF  is  then  defined  with  respect  to  a  specified 
neighborhood  system. 

Definition  2:  Let  n  be  a  neighborhood  system  over  the  2-  I  lattice  L.  A  random  field 
X  -  {X(i,  j'),  (:.  ;)cL}  is  a  MRF  witii  respect  to  n,  if  and  only  if 

iO  ^ [K.  ~  5;  >  0,for  all  X  (la) 

(u)  PiA'li,.?)  :=  x{i.j]\X{k,t)  x{kU),[k,t)eL,[k,(.)  ^  (i,j)] 

xii,])\X{k,€)  x{k,i),{k,i)en{i,j)]  (lb) 

where  Pbj  and  P'yb]  indicate  the  joint  and  conditional  probability  distributions  of  the 
random  field,  respectively.  The  order  of  the  neighborhood  system  n  is  called  the  order  of 
the  MRF  and  the  conditional  probabilities  in  (lb)  are  also  called  the  local  characteristics. 

The  concept  of  the  MRF  would  not  be  very  useful  for  practical  applications  if  it  were 
not  for  the  Hammersley  and  Clifford  theorem  which  establishes  the  relation  between  the 
MRF  and  the  Gibbs  Random  Field  (GRF)  and  hence  provides  the  functional  form  of  the 
joint  probability  distribution  function  for  a  MRF.  Before  the  GRF  can  be  defined,  the 
important  concept  of  a  clique  must  be  introduced. 

Definition  3:  Given  a  lattice  and  neighborhood  system  pair,  (L,  n),  a  clique  on  the  lattice, 
denoted  by  c,  is  a  subset  of  L,  such  that 

(i)  c  contains  at  least  a  single  point  of  L 

(ii)  if  (A:,  £)tc,  (i,  j)fc  and  {i,j]  {k,i),  then  {i,j)en{k  f). 

In  particular,  the  collection  of  all  the  cliques  of  the  pair  (L,  n)  is  denoted  by  C(L,  n). 
Examples  of  clique  types  under  different  neighborhood  systems  are  sho'  n  in  Fig.  4.3-2. 
Now,  the  GRF  can  be  defined  as  follows: 
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Definition  4:  A  random  field  X  =  {X(t,  j),  (t,  j)£L}  is  a  GRF  with  respect  to  a  given 
neighborhood  system  n,  if  and  only  if  its  joint  probability  distribution  function  is  of  the 
following  form: 


P[X  =  x]  =  Z-^exp[-U{x)], 

where 


(2a) 


and 


c»C(  L.n) 


(2b) 


Z  =  ^eip|-i7(x)].  (2c) 

allx 

Here,  V"f.(x)  is  caUed  the  clique  function  and  it  depends  only  on  the  points  in  clique  c  while 
Z,  called  the  partition  function,  is  a  normalizing  factor  to  make  (2a)  a  valid  probability 
distribution.  Notice  that  the  GRF  is  defined  in  terms  of  its  joint  probability  distribution, 
which  provides  complete  information  about  the  random  field,  while  in  the  case  of  a  MRF, 
there  is  little  known  about  the  joint  probability  distribution.  Similarly,  the  conditional 
probabilities  or  local  characteristics  of  a  GRF  cam  be  found  from  the  joint  probability 
distribution,  while  in  the  case  of  the  MRF  the  conditional  probabilities  are  not  readily 
apparent.  Hammersley  and  Clifford  have  established  the  equivalence  between  the  MRF 
and  GRF,  hence  making  the  MRF  a  feasible  model  for  practical  applications  such  as  texture 
modeling.  This  theorem  will  be  simply  stated  in  what  follows.  The  proof  is  rather  involved 
and  can  be  found  in  Besag’s  work  [14]. 

Theorem:  A  random  field  X  =  {X(t,y),  (t,  j)fL}  defined  over  L  is  a  MRF  with  respect  to 
the  given  neighborhood  system  n  if  and  only  if  it  is  a  GRF  with  respect  to  n. 

In  this  paper,  two  MRF  models  will  be  used  as  texture  models,  presented  in  the 
following  examples  in  terms  of  their  conditional  probability  distributions.  These  models 
have  been  widely  used  for  real-world  two-dimensional  (2-D)  phenomena,  including  textures, 
and  have  been  shown  to  be  simple  and  effective  [6-10].  They  are  the  main  MRF  texture 
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models  used  in  this  paper.  However,  other  models  can  also  be  defined  for  applications  of 
interest  by  properly  selecting  the  clique  functions  [14]. 

A.)  Example  of  a  First-Order  MRF: 

Consider  a  first-order  MRF  with  the  neighborhood  system  and  its  clique  types  shown 
in  Fig.'s  4.3-la  and  4.3-2a.  The  joint  probability  distribution  function  of  this  MRF  is: 

F[X  =  x]  =  Z~^exp[a 

(i.ikL 

(i.i)f  L 

r  62  x{tj)x{i-lj)].  (3) 

(i.ipL 

Notice  that  in  the  above  summations  the  periodic  lattice  structure  is  assumed.  The 
local  characteristics  of  the  MRF  can  be  found  ecisily  by  Bayes’  conditional  probability 
formula  as; 


P[X{t,j)  =  x{i,j)\X{k,l)  =  x(fc, £),(/:,  £)en(i,i)] 
^  exp[x(t,jXt,j)] 

exp[z(t,jXf,y)l’ 

where 


(4) 


s{ij)  =  a  +  6i[x(t,y  -  1)  +  x{i,j  +  1)1 

+  62[x(t  -  l,y) -h  i(t  +  l,y)|  (5) 

and  the  sum  is  over  all  possible  values  of  x(t,y).  A  special  case  is  when  61  =  62  =  b. 
This  is  called  an  isotropic  MRF. 

B.)  Example  of  a  Second-Order  MRF: 

This  MRF  model  has  the  second-order  neighborhood  system  and  clique  types  shown 
in  Fig.’s  4.3-lb  and  4.3-2b  with  the  following  joint  probability  distribution  function 
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P[X  =  x]  =  Z  ‘eip[a  3:(t,y) 

{i.y)<L 

x{ij)x{i,j-l] 

(».y)fL 

+  ^2  E  x{ij)x{i  -  l,j) 

+  ci  ^  x(t,jXt  -  1,J  -  1) 

(».y)fL 

+  C2  x(t,y)i(z  -  i,y  + 1)].  (6) 

(»‘.y)tL 

Again,  the  periodic  assumption  is  made  for  the  above  summations.  Similar  to  the 
previous  example,  the  local  characteristics  can  be  found  as: 


P[X(t,y)  =  x(f,y)|X(Ar,£)  =  x(/:,£),(A:,£)en(f,i)] 
Ex(i.y)  exp(x(t,y)<(t',y)]’ 

where  now 


t{ij)  =  a  +  bi[x{ij  -  1)  +  x{i,j  +  1)] 

+  b2{x{i  -  l,j)  +  x(t  +  l,j)] 

+  ct[x(t'  -  l,j  -  1)  +  x(i  +  l,y  +  1)] 

+  C2(x(i  -  l,y  -f  1)  +  x(t  +  l,y  -  1)].  (8) 

Suppose  a  given  image  f  =  {/(t,y),  (t,y)eL}  is  modeled  by  a  specific  MRF.  It  is 
desired  to  estimate  the  model  parameters  of  the  MRF  from  the  image  data.  Since  the 
ML  approach  to  be  developed  later  bears  close  relationship  with  the  parameter  estimation 
algorithm,  we  will  describe  it  in  detail. 

Let  the  parameters  be  denoted  by  the  vector  a.  For  example,  for  the  first-order 
Isotropic  MRF,  a  =  (a,b).  The  maximum-likelihood  (ML)  estimate  of  a,  denoted  a^L. 
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is  obtained  by  maximizing  the  likelihood  functional  L(f;a)  =  P[f\a)  where  F(f|a)  is  the 
joint  probability  of  f  as  a  realization  of  the  MRF  given  a  as  the  parameter  vector.  Once 
the  model  is  chosen,  P(f|a)  is  a  functional  of  the  vector  a.  Although  this  estimate  is 
optimum,  it  is  difficult  to  compute.  This  is  because  the  computation  of  the  conditional 
joint  probability  functional,  P(f|a),  involves  the  computation  of  the  normalization  factor 
Z  in  (2),  which  in  turn  contains  all  the  possible  realizations  of  the  MRF,  and  in  this  case 
is  also  a  functional  of  a.  The  computation  is  almost  impossible  even  for  a  binary  MRF 
(BMRF)  on  a  reasonably  small  lattice.  Obviously,  a  suboptimum  technique  needs  to  be 
used  which  preserves  some  optimality  of  the  (ML)  approach  and  yet  is  computationally 
feasible.  Besag’s  coding  method  [14]  is  such  a  technique. 

In  this  coding  method,  the  2-D  lattice  is  separated  into  disjoint  sets  of  points,  called 
codings,  according  to  the  neighborhood  system  assumption  of  the  MRF.  The  codings  are 
defined  in  such  a  way  that  the  points  in  each  coding  are  conditionally  independent  given 
the  random  variables  on  the  other  codings.  From  this  property,  no  two  points  in  the 
same  coding  are  neighbors.  Examples  of  codings  are  shown  in  Fig.  4.3-3  for  the  first  and 
second-order  MRF’s  discussed  in  this  section. 

Suppose  for  a  given  MRF  there  are  M  codings,  denoted  by  Ci,  C2, C^-  Define  for 
the  m’th  coding  the  following  coding -likelihood: 


L,„(f;a)  =  P[F{i,j)  =  f{i,j),{i,j)eCm,\a,F{k,t)  =  f{k,t),{k,l)eC^ 

for  all  I  <  q  <  M,q  ^  m\;  m=l,2,...,M.  (9) 

Since  the  points  in  a  fixed  coding  are  conditionally  independent,  can  also  be 

written  as 


L,„(f;a)  =  n  P[F{t,j)  =  f{i,j)\a,F{k,i)  ^  f{kJ),{k,l)en{i,j)],  (10) 

where  P[-'-\  is  the  local  characteristic  which  can  be  easily  computed.  Therefore,  Z/„i(f;a) 
is  also  easy  to  compute.  The  m’th  coding  estimate  of  the  parameter  vector  can  be  obtained 
by  maximizing  Z/,,i(f;a)  with  respect  to  a  for  m  =  1,2,...,  M.  The  resulting  estimate  will 
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be  denoted  by  ~  where  MCL  indicates  maximum  coding  likeJihood 

estimation  and  the  superscript  indicates  the  m’th  coding.  It  has  been  noticed  by  Besag 
[14],  Cross  and  Jain  [7],  and  the  present  authors  that  this  coding  method  provides  very 
accurate  estimates.  Also,  the  estimates  obtained  by  using  different  codings  arc  very  close 
to  each  other.  In  the  remainder  of  this  paper,  for  definiteness,  we  compute  the  maximum 
coding-likelihood  estimate  as  the  average  over  all  That  is,  we  take 


^MCL 


1  ^ 
A  1 

^  M  ^ 


^MCL- 


(11) 


m=  1 


4.3.2  Texture  Classification: 

In  this  section,  we  will  first  develop  the  MRF  model-based  ML  approach  for  texture 
classification  for  the  case  where  training  data  is  available  and  then  show  that  it  can  be 
combined  with  a  clustering  algorithm  when  training  data  is  not  available.  A  block  diagram 
outlining  an  approach  to  the  texture  classification  problem  is  shown  in  Fig.  4.3-4.  The 
inputs  to  the  classifier  are  digital  images  containing  texture  data  from  one  of  a  finite 
number  of  texture  classes.  These  images  are  separated  into  the  unknown  or  test  set  of 
images,  whose  texture  class  is  unknown,  and  the  training  set  of  images,  whose  texture  class 
is  known  a  priori.  The  training  set  is  necessary  to  provide  information  that  will  be  used 
by  the  classifier  in  the  decision  process.  In  the  parameter  estimation  stage,  information 
essential  for  differentiating  the  texture  classes  is  estimated  from  the  training  set  of  images 
and  used  to  adapt  the  classifier  for  all  the  possible  texture  classes.  Then  the  unknown 
images  will  be  processed  by  the  classifier  to  decide  which  texture  class  is  presented. 

Let  the  image  data  on  which  the  classifier  is  to  operate  be  denoted  by  f  = 
{/(i,y),  (:,  j)eL}.  Assume  that  there  are  K  texture  classes  or  hypotheses  labeled  by 
Hk,k  =  0,1,2, K  -  1.  The  class-conditional  likelihood  functional  [15],  assuming  the 
k’th  hypothesis  is  acting,  is  then  defined  ais; 


Lfc(f)  =  P(f|/ffc);  fc  =  0,l,...,K-l,  (12) 

where  P{f\Hk)  is  the  joint  probability  distribution  of  the  random  field  assuming  that 
hypothesis  Hk  is  acting. 
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If  the  MRF  described  in  Section  4.3-2  is  used  as  the  texture  model,  the  likelihood 
functional  in  (12)  is  the  joint  probability  of  the  MRF  model  for  th^  k’th  texture  class.  In  this 
paper,  we  assume  different  texture  classes  are  modeled  by  MRF’s  with  the  same  functional 
forms  of  probability  distribution  but  different  in  the  parameters  in  these  functionals.  For 
example,  if  the  first-order  isotropic  model  is  used,  the  likelihood  functional  is 

L!,{{)  =  P[f\ak]  =  P[f\{ak,b<J];  A:  -  0, 1, ...,  X  -  1,  (13) 

where  a;^  =  (a^,  bk)  is  the  parameter  vector  for  class  k.  According  to  the  ML  decision  rule, 
which  minimizes  the  clcissification  error  probability,  the  data  is  assigned  to  texture  cleiss 
ko  corresponding  to  the  index  that  maximizes  the  class-conditional  likelihood  functional 
in  (12). 

Although  this  approach  is  optimum,  it  is  usually  difficult  to  implement,  since  the  com¬ 
putation  of  the  normalization  factor  Z  in  P[f  just  as  the  case  of  parameter  estimation, 
involves  all  the  possible  realizations  of  the  MRF.  A  reasonable  suboptimum  approach  is 
to  use  a  likelihood  functional  which  is  closely  related  to  the  joint  probability  function 
and  is  yet  easy  to  compute.  The  coding  likelihood  used  in  Besag’s  coding  method  for 
model  parameter  estimation  can  be  used  to  develop  such  a  suboptimum  approach.  In¬ 
stead  of  computing  the  joint  probability,  the  coding  likelihood,  defined  as  follows,  is  used 
as  the  likelihood  functional.  More  specifically,  suppose  there  are  M  codings  denoted  by 
C i,C2,  ,  the  class-conditional  likelihood  evaluated  on  the  m’th  coding  is 


L^,fc(f)  =  P[f{i,j),{i,j)eCm\ak,f{k,e),{k,t)  i  C,„] 

(  i ,  J  )f  m 

m  -  1, 2, ...,  M; =  0, 1, ..., /C  —  1.  (14) 

As  mentioned  previously,  the  coding  likelihoods  computed  for  different  codings  are 
very  close.  In  principle,  we  could  choose  any  value  of  m  =  1,2,...,M  and  perform  ML 
classification  on  the  basis  of  Z/„ijt(f)»^  =  0, 1,...,/C-  1.  However,  for  definiteness,  we  have 
chosen  again  to  average  the  various  ending  likelihoods.  More  specifically,  define 
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(15) 


M 

k-l 

The  decision  rule  then  becomes:  assign  the  data  to  class  ka  if 

When  training  data  is  available,  this  suboptimum  ML  classifier  can  be  implemented 
with  the  estimated  parameter  vectors  for  each  texture  class  from  the  training  data  set 
using  Besag’s  coding  method.  When  the  training  data  is  not  available,  as  is  often  the  case 
in  many  practical  applications,  the  parameter  vector  for  each  texture  class  can  be  obtained 
as  follows.  First,  model  parameter  vectors  are  estimated  from  every  observed  image.  Then 
these  vectors,  also  called  samples,  are  grouped  into  several  disjoint  sets  called  clusters. 
Finally  the  centroids  of  the  clusters  are  used  as  the  estimated  claiss  model  vectors  in  the 
ML  classifier.  This  is  usually  referred  to  as  a  clustering  procedure  in  pattern  recognition 
and  the  algorithm  which  performs  the  grouping  is  called  a  clustering  algorithm.  There  are 
many  clustering  algorithms  available  [I6j,  [17].  In  this  paper  we  make  use  of  the  /C-means 
algorithm  [17].  It  h2is  been  shown  that  this  algorithm  is  optimal  under  a  specific  cluster 
criterion  function  and  convergent  under  a  well  defined  condition  for  the  distribution  of 
samples.  It  is  also  simple  to  implement.  The  major  disadvantage  of  this  algorithm  is  that 
the  number  of  clusters  has  to  be  known  before  applying  the  algorithm,  which  is  sometimes 
an  impractical  assumption.  A  number  of  techniques,  mostly  heuristic,  have  been  proposed 
to  determine  the  number  of  clusters,  or  classes,  and  there  is  no  well  accepted  theory  [16], 
18!.  In  this  paper  we  assume  that  the  number  of  classes  is  known  or  predetermined  and 
pursue  this  problem  in  other  separate  work. 

Texture  classification  experiments  have  been  performed  on  both  synthetic  texture 
classes  and  natural  texture  classes  to  test  the  efficacy  of  the  ML  approach  described  above. 
The  synthetic  textures  used  are  realizations  of  binary  Markov  random  fields  (BMRF’s) 
while  the  natural  textures  are  equal  probability  quantized  binary  images  from  Brodatz’s 
photo  album  [19|.  For  each  case,  the  classification  is  performed  with  the  aid  of  training 
data  or  using  the  clustering  method.  The  experimental  results  are  presented  as  follows. 
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A. )  Supervised  Classification  of  Synthetic  Textures: 

The  texture  classes  used  in  this  experiment  are  generated  cls  follows:  First,  a  re¬ 
alization  of  a  binary  MRF  of  size  240  x  24  specified  by  a  parameter  vector  = 

0,  1, 1,  is  generated  using  Geman  and  Geman’s  algorithm  [9|.  Next,  each  240  x  240 
image  is  cut  into  nine  80  x  80  subimages.  Finally,  the  subimage  at  the  upper-left  corner  is 
used  as  the  training  data  for  that  texture  class,  while  the  rest  of  the  subimages  are  taken 
as  test  data. 

The  first-order  isotropic  BMRF  is  used  in  both  texture  generation  and  parameter 
estimation.  Four  texture  classes  are  generated.  The  estimated  model  parameters  from  the 
training  data  for  each  texture  class  are  shown  in  Table  4.3-la  along  with  the  actual  model 
parameters  used  to  generate  the  texture  classes.  The  240  x  240  image  for  each  texture 
class  is  shown  in  Fig.  4.3-5.  It  can  be  seen  from  these  images  that  they  have  different 
clusterings.  The  classification  results  are  shown  in  the  contingency  table  in  Table  4.3-2a. 
All  the  data  are  correctly  classified.  Similar  results  have  been  obtained  for  second-order 
MRF’s  il9;. 

B. )  Unsupervised  Classification  of  Synthetic  Textures 

In  this  experiment  all  the  subimages  in  A.)  are  used  as  test  data.  The  clustering 
algorithm  described  previously  is  applied  on  the  set  of  model  vectors  estimated  from  the 
thirty-six  subimages,  using  the  same  BMRF  model  as  A.)  and  assuming  the  number  of 
classes  is  known  to  be  four.  The  cluster  centroids  as  shown  in  Table  4.3-lb  along  with 
the  model  parameters  that  generates  the  synthetic  textures  and  the  classification  result  is 
shown  in  Table  4.3-2b.  The  estimated  model  vectors  obtained  by  clustering  are  very  close 
to  the  actual  values  and  all  data  are  correctly  assigned. 

C. )  Supervised  Classification  for  Natural  Textures 

The  natural  textures  used  in  this  experiment  are  four  texture  images  from  Brodatz’ 
photo  album  [20j.  They  were  originally  256  grey  level  images  of  size  128  x  128.  An 
equal  probability  quantization  is  performed  to  transform  these  textures  into  binary  images. 
Figure  4.3-6  shows  the  binary  quantized  images  of  these  texture  classes.  The  training  data 
and  test  data  sets  are  obtained  ais  follows:  First,  each  image  is  cut  into  four  64  x  64 
subimages.  Then  the  subimage  at  the  upper-left  corner  is  used  as  training  data  while  the 
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rest  are  used  as  test  data  for  that  texture  class.  Usually,  natural  textures  are  modeled  by 
MRF  models  of  order  higher  than  one  [7].  However,  we  found  that  when  fitted  with  the 
second-order  MRF  model,  parameter  Ci  and  C2  for  these  images  are  quite  small  comparing 
to  the  other  ones  in  (8),  hence  all  the  binary  texture  classes  are  modeled  as  first-order 
BMRF’s.  The  class  parameter  vector  for  each  class  is  estimated  from  training  data  and 
shown  in  Table  4.3-3a  with  corresponding  classification  results  in  Table  4-3. 4a.  All  the 
subimages  have  been  correctly  classified. 

D.)  Unsupervised  Classification  for  Natural  Texture 

The  results  for  this  part  is  obtained  in  the  same  way  as  in  C.)  except  the  class  model 
vectors  are  obtained  through  clustering,  assuming  the  number  of  classes  is  known  to  be 
four.  The  centroids  of  the  clusters  and  classification  results  are  shown  in  Table  4.3-3b  and 
4.3-4b.  All  the  data  are  correctly  classified. 

4.3.3  Texture  Discrimination: 

Unlike  texture  classification,  which  assigns  an  entire  image  to  a  specific  class,  the 
interest  now  is  to  discriminate  between  different  texture  classes  within  the  image.  The  ML 
approach  of  (4|  is  adapted,  under  a  MRF  modeling  assumption,  to  develop  a  new  likelihood 
functional  using  the  information  provided  by  the  MRF  model.  Discrimination  experiments 
have  been  performed  on  test  images  containing  synthetic  textures  and  natural  textures. 

Assume  that  textured  image,  f  =  {/(i,  j),  (i,  j  jeL},  is  a  realization  of  a  random 
field  denoted  by  F  =  {F(i,  j),  (i, y)cL}  and  the  lattice  can  be  decomposed  into  regions 
Ri,....,Rg  of  K  different  textures  as  shown  in  Fig.  4.3-7.  That  is 

L  =  uJ^,R,,.  (17) 

where  K  <  Q. 

We  will  model  the  texture  classes  within  each  region  as  a  MRF  defined  over  that 
region.  Regions  belonging  to  the  same  texture  class  will  have  the  same  MRF  model  vector. 
Suppose  each  pixel  {i,j)  of  the  image  belongs  to  one  of  K  texture  classes  denoted  hy  the 
hypothesis  k  =  0,1,...,/^  -  1.  Texture  discrimination  is  the  process  in  which  each 
pbcel  is  assigned  to  a  particular  class.  Suppose  a  window  of  size  (2A/  f  1)  x  [2M  +  1)  is 
constructed  for  each  pixel  position  (t, j)  and  the  pixels  within  this  window  are  denoted 
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by  lij  =  where  li'j.y  =  {(k,£),t  —  M  <  k  <  i  M^j  —  M  <  I  < 

j  +  M},  with  M  is  much  less  than  N  and  the  periodic  condition  is  imposed.  The  likelihood 
functional  in  this  case,  given  that  the  k’th  hypothesis  is  acting,  is  defined  as 

Lk{Tij}  =  P{Tij\Hk};  k  =  0,l,...,K-l.  (18) 

Pixel  (f,  y)  will  be  assigned  to  texture  class  ko,  if 

After  this  procedure  is  performed  for  all  the  pixels,  the  image  will  be  segmented  into 
disjoint  regions  which  belong  to  different  texture  classes  in  such  a  way  as  to  minimize  the 
classification  error  [4]. 

Although  this  method  is  theoretically  optimal,  the  joint  probability  of  the  pixels  in 
the  window  is  hard  to  evaluate  and  complicates  the  evaluation  of  For  example, 

assuming  the  texture  in  each  regions  R,  is  modeled  cis  a  MRF,  the  likelihood  functional 
in  (17)  can  be  written  eis 

F[/(fc,£),(A:,£)eR,,y|.^fc];  /:  =  0, 1, ...,  RT  -  1,  (20) 

lik.l) 

where  P\-\Hk\  is  the  class-conditional  joint  probability  distribution  of  the  MRF  and  R^^y  is 
one  of  the  R(^s  in  (17).  The  difficulty  of  evaluating  (20)  is  that  region  Ri,j  is  unknown  before 
the  discrimination  process.  Again,  as  in  the  case  of  texture  classification,  a  suboptimal 
approach  is  desired  which  preserves  some  optimality  of  the  previous  likelihood  functional 
and  yet  is  easy  to  compute.  The  ML  approach  developed  in  this  section  is  such  an  approach 
which  uses  the  coding  structure.  In  this  approach,  the  likelihood  is  the  coding-likelihood 
evaluated  only  within  the  decision  window  centered  at  the  pixel  to  be  classified.  More 
specifically,  we  chose  a  coding  which  contains  the  center  point  in  the  window,  denoted 
by  Ci  j.  Now,  the  ML  approach  can  be  described  eis  follows:  Define  the  class-conditional 
coding-likelihood  as 
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(21) 


Lc.ki^i.j}  -  P[f{k,t),{k,l)tCi_j  n  )V»jl/(m,n),  (m,n)€Ri,j ,  (m,n)  ^  C,,j,Rfc] 

=  n  P[f{kJ)\f{m,n),{m,n)en{k,l),Hk\\ 

{k.n(C\.,nW.,, 

k  =  0,  -  1. 

This  likelihood  can  be  easily  computed  from  the  local  characteristics  and  the  discrim¬ 
inator  will  assign  a  pixel  (»,  j)  to  texture  class  ko  if 

Lc.kA^i.j}  =  o<fc<^-i 

Due  to  its  simplicity,  the  algorithm  above  is  quite  efficient.  For  an  .V  x  N  image  with 
K  texture  clziss  types  it  requires  approximately  K  computations  to  process  the  image. 
Note  that  the  previously  described  MAP  algorithm  of  Derin,  et  al.  requires  about 
computations  using  dynamic  programming  where  D  is  an  integer  and  D  >2.  In  addition, 
the  ML  algorithm  proposed  here  can  be  implemented  through  parallel  computation  since 
the  assignment  of  a  pixel  in  the  image  does  not  depend  on  that  of  others. 

When  training  data  for  different  types  of  textures  are  available,  the  MRF  model  clziss 
vectors  can  be  estimated  from  them,  resulting  in  supervised  texture  discrimination.  When 
the  training  data  is  not  available,  a  clustering  scheme,  similar  to  the  one  described  in  the 
Icist  section  for  unsupervised  texture  classification  can  be  used.  In  particular,  consider 
a  sliding  window  of  size  Mi  x  M2  on  the  observed  image  eis  shown  in  Fig.  4.3-8  where 
we  assume  Mi,  M2  <<  N.  At  each  position  of  the  window,  a  MRF  parameter  vector  is 
estimated  from  the  data  within  the  window.  The  parameter  vectors  obtained  are  then 
used  as  the  sample  vectors  for  the  clustering  algorithm.  We  have  chosen  the  K-means 
algorithm  as  the  clustering  algorithm.  Finally  the  centroids  of  the  clusters  are  taken  as 
class  model  vectors  as  if  they  were  estimated  already  from  training  data.  Here,  again,  we 
assume  knowledge  of  the  number  of  clusters.  In  this  clustering  approach  the  choice  of  size 
of  the  window  is  quite  important.  If  the  size  is  too  large,  a  single  window  might  contain 
data  from  several  different  texture  classes  whereas  if  the  size  is  too  small,  a  single  window 
might  not  contain  enough  data  for  reliable  estimation.  Both  result  in  unreliable  sample 
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vectors  which  will  effect  the  accuracy  of  the  results  of  the  clustering  algorithm.  The  size  of 
the  window  should  be  related  to  the  expected  size  of  the  texture  regions  in  the  image.  At 
this  point,  we  make  the  choice  heuristically  trying  different  windows  and  selecting  the  one 
which  provides  reasonably  good  segmentations.  Notice  here,  the  sliding  window  described 
above  is  used  for  unsupervised  model  parameter  estimation  before  segmentation  whereas 
the  decision  window  described  previously  is  used  during  the  segmentation. 

Texture  discrimination  experiments  have  been  performed  on  images  containing  syn¬ 
thetic  and  natural  textures.  Each  test  image  used  in  these  experiments  is  128  x  128  and 
contains  two  different  textures  distributed  in  the  image  according  to  the  “region-map” 
shown  in  Fig.  4.3-9.  After  the  ML  discrimination,  each  pixel  in  the  resulting  image  is 
assigned  one  of  two  gray  levels,  depending  on  which  texture  cleiss  it  belongs  to.  The  re¬ 
sults  for  both  supervised  and  unsupervised  discrimination  are  presented  below.  While  the 
results  are  for  binary  and  two  class  images,  the  extension  of  the  method  to  non-binary  and 
multi-class  problem  is  straightforward. 

A. )  Supervised  Discrimination  of  Synthetic  Textures: 

The  two-ciass  test  image  is  shown  in  Fig.  4.3-lOb  along  with  the  region  map.  The 
two  synthetic  textures  are  generated  using  the  first-order  isotropic  BMRF’s.  The  model 
parameters  are  estimated  from  training  data  which  are  different  realization  of  the  above 
BRMF’s  and  are  listed  in  Table  4.3-5a.  The  results  of  applying  the  ML  discriminator  in 
(22)  with  different  decision  window  sizes  are  shown  in  Fig.  4.3-lOc  and  4.3-lOd.  As  can  be 
seen,  the  3x3  window  provides  very  good  discrimination.  More  extensive  experimental 
results  of  the  same  nature  can  be  found  in  [19],  [2l|. 

B. )  Unsupervised  Discrimination  of  Synthetic  Textures: 

The  test  image  used  in  this  experiment  is  the  same  as  that  in  A.).  The  model  vectors 
for  the  two  different  texture  classes  are  obtained  from  clustering  using  a  nonoverlapping 
32  X  32  sliding  window. That  is,  we  take  Ni  ~  Mi,t  =  1,2,  here  and  in  all  experimental 
results  to  follow.  The  estimated  values  are  shown  in  Table  4.3-5b  and  the  results  of 
discrimination  using  these  model  vectors  is  shown  in  Fig.’s  4.3-lOe  and  4.3-lOf.  It  can  be 
seen  the  clustering  scheme  is  quite  effective  in  the  ideal  case  when  the  textures  are  from 
MRF’s. 
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C. )  Supervised  Discriniination  for  Natural  Textures: 

The  test  image  shown  in  Fig.  4.3-1  lb,  along  with  the  region  map,  contains  two 
binary  quantized  natural  textures  from  Brodatz’  photo  album.  These  binary  textures 
are  modeled  here  by  the  first-order  BMRF  model.  The  model  parameters  are  estimated 
from  training  data  and  are  shown  in  Table  4.3-6a.  The  discrimination  results  for  selected 
decision  windows  are  shown  in  Fig.  4.3-1  Ic  and  4.3-1  Id.  Notice  now  that  a  larger  decision 
window'  is  needed  to  obtain  results  comparible  to  those  of  A.).  This  might  be  caused  by 
the  model  mismatch.  However,  the  discrimination  results  are  still  quite  good. 

D. )  Unsupervised  Discrimination  for  Natural  Textures: 

The  experiment  in  C.)  is  repeated  with  the  model  parameter  vectors  obtained  from 
clustering  with  a  nonoverlapping  16  x  16  sliding  window.  The  resulted  cluster  centroids 
(model  vectors)  are  shown  in  Table  4.3-6b  and  the  texture  discrimination  results  with 
different  decision  window  size  are  shown  in  Fig.  4.3-1  le,  and  f.  Again,  the  clustering 
approach  worked  well. 

1.3.4  Summary 

In  this  paper,  we  have  developed  a  MRF  model-bcised  ML  approach  to  texture  classi¬ 
fication  and  discrimination  problems.  Under  the  MRF  texture  modeling  assumption,  they 
were  formulated  as  statistical  decision  problems.  To  make  computation  feasible,  the  likeli¬ 
hood  functional  originally  derived  beised  on  the  joint  probability  distribution  of  the  MRF 
model  is  approximated  using  Besag’s  coding  method.  Most  of  the  statistical  model-based 
approaches  proposed  previously  are  supervised.  That  is,  they  require  a  training  data  set 
for  model  parameter  estimation.  Unlike  these  approaches,  we  also  consider  unsupervised 
schemes  which  do  not  require  the  training  data.  For  the  latter,  a  novel  clustering  technique 
is  proposed  to  estimate  the  model  parameters  directly  from  the  observed  image.  Exper¬ 
imental  results  on  texture  classification  and  discrimination  using  these  tw’o  schemes  are 
shown  to  be  quite  promising. 

However,  there  are  limitations  to  the  MRF  model-based  approach.  For  example,  in 
the,  unsupervised  clustering  scheme  we  assume  the  number  of  different  texture  classes  is 
known  which  is  generally  not  the  cause.  Although  a  number  of  methods  exist  which  can  be 
used  to  determine  the  number  of  clas.ses,  they  are  mostly  ad  hoc  and  there  is  little  known 
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as  to  how  well  they  work  in  general.  In  some  cases,  a  recisonable  assumption  might  be 
made  about  the  number  of  classes  based  on  a  priori  knowledge  of  the  situation.  However, 
to  make  the  unsupervised  scheme  work  in  general,  it  is  desired  to  develop  more  reliable 
methods  to  estimate  this  number.  One  such  approach  is  described  in  [21]. 
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Table  4.3-3 


Estimated  Model 

Parameters 

Parameters 

Grass 

Wood 

Bark 

Sand 

a 

-1.94 

-4.56 

-2.95 

— 

-2.26 

0.75 

0.36 

1.21 

1.23 

b2 

1.14 

_ 

4.07 

l./l 

1.17 

a.)  Estimated  from  training  data 


Estimated  Model 

Parameters 

Grass 

Wood 

Bark 

Sand 

-1.80 

-3.92 

-2.77 

-2.35 

0.67 

0.32 

1.09 

1.23 

b2 

1.11 

3.62 

1.71 

1.12 

b.)  Estimated  by  clustering 


Estimated  Parameters  for  Natural  Texture 
Samples  Modeled  as  First-Order  BMRF's. 


Assigned  Class 

True  Class 

Grass 

Wood 

Bark 

Sand 

Grass 

4 

0 

0 

Wood 

0 

4 

0 

0 

Bark 

0 

0 

4 

0 

Sand 

0 

0 

D 

■bhi 

Assigned  Class 

True  Class 

Grass 

Wood 

Bark 

Sand 

Grass 

4 

0 

0 

0 

Wood 

0 

4 

0 

0 

Bark 

0 

4 

B 

Sand 

0 

0 

B 

a.)  Supervised  Classification  b.)  Unsupervised  Classification 

Table  4.3-4 

Classification  Results  for  the  Natural  Texture  Samples. 
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Table  4.3-5 


Est.  Parameter 

Texture  Class 

a 

b 

Class  1 

-5.55 

Class  Z 

5.72 

a.)  Estimated  from  training  data 


Est.  Parameter 

Texture  Class 

a 

b 

Class  1 

-4.67 

2.32 

Class  2 

3.56 

-1.87 

b.)  Estimated  by  clustering 


Estimated  Parameters  from  the  first-order  BMRF 
For  Synthetic  Texture  Discrimination. 


Table  4.3-6 


1 

Estimated  Parameters 

Texture 

a 

bl 

b2 

brass 

-1.94 

.75 

1.14 

Ripple 

-3.03 

.343 

2.53 

a.)  Estimated  from  training  data 


Estimated  Parameters 

Texture 

a 

bl 

bz 

Grass 

-1.82 

.53 

1.31 

Ri pple 

-2.88 

-.15 

3.05 

b.)  ^'•timated  by  clustering 


Estimated  First-order  MRF  Model  Parai  eters 
for  the  Discrimination  of  Natural  Textured  Image. 


Figure  4.3-2 


f(i.j)}  {(i.j-l).  (i.j)l 

a.)  Cliques  for  the  First-Order  Neighbor  Set 


!(i.j)}  f(i.j-l).  (i.j)!  !(i-l.i).(i.j)}  l(i.j).(i-l.j^l)l.  j^l)l 


l(i.j).(i-l.i).  l(i.j).(i-l.j).(i.j-l)j 


Examples  of  Cliques  for  First-Order  and 
Second-Order  Neighborhood  Systems 
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Figure  4.3-3 


1 

2 

1 

2 

1 

i 

2 

1 

2 

1 

1 

f 

2 

1 

1 

2 

1 

1 

1 

2 

1 

> 

1 

2  12  1 


121:1 


1 

1 

2 

1 

2 

! 

■ 

3 

■ 

) 

' 

2 

1 

2 

3 

‘ 

5 

« 

3 

■ 

1 

2 

1 

2 

1 

2 

3 

i 

3 

« 

3 

1 

)  Codings  for  the  First-Order 
Neighbor  Set 


b.)  Codings  for  the  Second-Order 
Neighbor  Set 


Exair^jles  of  Different  Codings. 


Figure  4.3-4 


A  Texture  Classification  System 


decisions 
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Figure  4.3-5 

Svnthotif  Testvires  Modeled  as  Kirst-drder  BMRF's 


Figure  4.3-6 

Binary  Quantized  Samples  of  Natural  Textures. 


c.)  Bark  d.)  Sana 


Figure  4.3-7 
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Figure  4.3-8 
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A  Sliding  Window  on  the  Image  Plane  . 
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Figure  4.3-9 
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a  )  Region  Map 
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b.)  Two-Texture  Image 
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Supervised  discrim¬ 
ination  with  a  5X5 
decision  window 


Supervised  discrim¬ 
ination  with  a  11X11 
decision  window 


e  )  Unsupervised  dis¬ 
crimination  with  a 
5  ■  5  decision  window 


f.)  Unsupervised  dis¬ 
crimination  with  a 
11-11  decision  window 
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l  (Muster  \'aruIation  With  Application  to  Image  Segmentation: 


(Mustering  procedures  have  found  wide  application  in  statistical  data  analysis  and 
processing.  MMie  application  of  specific  interest  here  is  stochastic  model-based  image  seg- 
nu'ntatitui  where  a  clustering  algorithm  is  used  to  estimate  the  model  parameters  for  the. 
various  image  classes  in  an  observed  image.  In  this,  and  similar  applications,  it’s  generally 
ih('  case  that  th('  clustering  algorithm  requires  prior  knowledge  of  the  number  of  clusters 
or  data  clas.ses.  For  many  applications,  however,  the  number  of  clusters  is  not  known  a 
priori  and  we  would  like  to  determine  it  directly  from  the  data.  This  is  known  as  the 
cluster  validation  problem.  For  stochastic  model-based  image  segmentation,  the  solution 
of  this  problem  directly  affects  the  quality  of  the  segmentation.  In  this  work  we  propose  a 
model  fitting  approach  to  the  cluster  validation  problem  based  upon  Akaike’s  Information 
Criterion  (.MC).  The  explicit  evaluation  of  the  AIC  for  the  image  segmentation  problem 
is  achieved  through  an  approximate  maximum-likelihood  (ML)  estimation  algorithm.  We 
demonstrate  the  efficacy  of  the  proposed  approach  through  experimental  results  for  both 
synthetic  mixture  data,  where  the  number  of  clusters  is  known,  and  to  stochastic  model- 
based  image  segmentation  operating  on  real-world  images,  for  which  the  number  of  clusters 
is  unknown.  This  approach  is  shown  to  correctly  identify  the  known  number  of  clusters  in 
the  synthetically  generaLed  data  and  to  result  in  good  subjective  segmentations  in  aerial 
photographs. 

1.1.1  Hackground: 

Clustering  procedures  are  widely  used  in  various  applications  of  pattern  classification 
and  statistical  data  analysis.  In  a  clustering  procedure,  the  observed  data  or  entities  are 
grouped  together  to  form  a  number  of  clusters  in  such  a  way  that  the  entities  within  a 
cluster  are  more  similar  to  each  other  than  to  those  in  other  clusters.  The  measure  of 
similarity,  usually  heuristically  defined,  is  called  the  cluster  criterion. 

For  the  past  three  decades,  many  clustering  algorithms  have  been  developed  by  re¬ 
searchers  in  such  diverse  fields  as  biology,  statistical  data  analysis  and  pattern  recognition, 
using  very  different  cluster  criteria  ill.  In  some  previous  work  |2]- -1  on  stocha-slic  iTiodel- 
l)ased  image  segmentation,  clustering  algorithms  have  been  used  to  estimate  the  model 
[Kirarneter  vectors  for  different  imago  cla.s.ses  directly  from  the  observed  image.  Since  the 
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nature  of  this  work  is  related  to  statistical  pattern  recognition,  the  clustering  algorithm 
used  was  selected  from  those  developed  within  the  pattern  recognition  community.  One  of 
the  most  successful  clustering  algorithms  in  this  respect  is  the  /C-means  algorithm  [5], [6]. 
This  algorithm  is  optimum  in  the  sense  that  it  minimizes  the  variance  within  each  clus¬ 
ter  and  has  been  widely  used  in  unsupervised  pattern  recognition.  However,  an  important 
problem  e.xisting  with  most  clustering  algorithms,  including  the  /f-means  algorithm,  is  that 
the  number  of  clusters  in  the  data  must  be  specified  a  priori  before  using  the  clustering 
algorithm. 

In  some  situations  this  number  can  be  derived  from  prior  knowledge  about  the  data, 
or  sometimes  can  even  be  determined  from  visual  inspection  of  the  two-  dimensional  pro¬ 
jection  of  the  data.  However,  in  many  applications,  such  as  our  previous  work  on  image 
segmentation,  it  is  desired  to  estimate  this  number  directly  from  the  observed  data  since 
a  priori  knowledge  is  generally  not  available  and  the  data  are  often  vectors  of  dimension 
higher  than  two  such  that  the  projection  method  is  not  satisfactory.  Furthermore,  even 
when  the  data  is  two  dimensional,  visual  inspection  may  not  be  successful  if  the  data 
clusters  cannot  be  decided  by  observation.  This  problem  is  of  great  practical  importance 
for  many  clustering  algorithms  ajid  is  known  as  the  cluster  validation  problem  [7].  For 
stochastic  model-based  image  segmentation,  such  as  the  schemes  described  in  (2]-[4|,  the 
solution  of  this  problem  directly  affects  the  quality  of  the  resulting  segmentation.  If  the 
estimated  number  of  clusters,  or  data  classes,  is  smaller  than  the  true  value,  the  objects 
in  the  image  will  not  be  well  separated.  Likewise,  if  this  estimated  number  is  too  large,  a 
single  object  may  be  separated  into  a  number  of  smaller  regions.  Both  of  these  situations 
are  to  be  avoided. 

Most  of  the  previously  proposed  solutions  to  the  cluster  validation  problem  can  be 
classified  into  two  approaches:  a  heuristic  approach  and  a  statistical  hypothesis  testing 
approach.  In  the  heuristic  approach,  the  number  of  clusters  are  determined  by  using 
some  ad  hoc  criteria.  For  example,  for  the  fC-means  algorithm  it  has  been  proposed  to 
look  at  the  plot  of  the  average  of  the  variances  within  the  clusters  under  assumptions  of 
different  /f,  the  number  of  clusters.  The  value  of  K  corresponding  to  the  point  where 
the  curve  begins  to  saturate  can  then  be  taken  the  estimated  number  of  classes.  .Many 
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ad  lioc  variations  of  the  A'-incans  algorithms  have  been  proposed  based  on  similar  ideas. 
In  these  algorithms,  the  number  K  is  increased  or  decrea.sed  according  to  criteria  such 
as  intra-chister  variance  and  distance  between  clusters.  While  some  practical  problems 
can  be  solved  using  the  heuristic  approach,  it  does  not  provide  a  general  solution  to  the 
cluster  validation  problem  and,  even  when  applied  to  specific  problerrus,  the  criteria  have 
to  be  fine-tuned  through  trial-and-error.  This,  in  part,  reflects  the  difficult  nature  of  the 
problem.  More  specifically,  as  pointed  out  by  Everitt  [l]  and  Jain  \1\,  clusters  are  generally 
very  difficult  to  define  precisely. 

To  find  generally  applicable  and  mathematically  rigorous  solutions  to  cluster  vali¬ 
dation,  many  researchers  have  tried  to  formulate  the  problem  as  a  statistical  hypothesis 
testing  problem  8h,9].  For  example,  hypothesis  tests  have  been  proposed  to  test  whether 
a  given  cluster  should  be  divided  into  two.  More  general  likelihood  tests  have  been  at¬ 
tempted  with  the  data  modeled  in  terms  of  finite  mixture  distributions  [9'.  However,  due 
to  the  structure  of  the  mixture  distribution,  the  parameters,  which  characterize  one  hy¬ 
pothesis  (for  example,  the  null  hypothesis)  are  at  the  boundary  of  the  parameter  space  of 
the  other  hypothesis.  This,  in  turn,  violates  the  regularity  conditions  (cf.  [9j)  which  are 
required  for  the  validity  of  the  asymptotic  distribution  theory  for  the  generalized  likelihood 
ratio  (GLR)  lest  which  exists  for  many  simple  hypothesis  testing  situations  where  each  of 
the  hypotheses  can  be  described  in  terms  of  a  single  probability  distribution.  As  a  result, 
no  GLR  test  is  available  at  this  point  to  determine  the  number  of  clusters  directly  from 
observation  data. 

On  the  other  hand,  'he  problem  we  face  is  not  unlike  the  one  faced  in  developing  a 
theory  to  fit  an  autoregressive  (AR)  model  to  real-world  data  in  which  the  order  of  the 
model  has  to  be  decided  before  the  model  parameters  can  be  estimated  from  the  data. 
Having  observed  that  neither  heuristic  nor  hypothesis  testing  approaches  alone  would 
provide  a  satisfactory  solution  to  determining  the  order  of  the  model,  hence  the  practical 
fitting  of  a  model  to  observation  data,  Akaike  lO’  suggested  that  the  problem  should 
be  viewed  as  a  multiple  decision  problem.  That  is,  rather  than  asking  which  hypotlic.sis 
is  acting  (which  order  is  correct),  we  should  ask  which  model  best  fits  the  data,  'fhe 
goodness  of  fit,  as  pointed  out  later  l)y  Akaike  'll;,  should  be  a  properly  (hdined  entropy 
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function  and  the  best  fit  should  be  obtained  by  majcimizing  this  quantity.  Based  on 
this  maximum-entropy  principle,  Akaike  proposed  a  criterion,  called  the  AIC  (Akaike’s 
information  criterion),  to  determine  both  the  order  and  the  parameters  of  an  AR  model  for 
observed  data.  Although  there  have  been  some  criticisms  of  the  AIC  as  being  inconsistent, 
Akaike  showed  that  the  AIC  is  robust  and  optimal  in  a  minimax  sense.  That  is,  it  is  optimal 
when  there  is  no  a  priori  knowledge  about  the  distribution  of  the  model  parameters.  In 
addition,  Akaike  and  others  also  extended  the  AIC  to  several  Bayesian  variations  called 
the  BIC  (Bayesian  Information  Criterion)[l3],[l4].  This  class  of  criteria  can  be  shown  to 
be  AIC’s  averaged  with  respect  to  various  a  priori  distributions  for  the  model  parameters. 
Although  the  AIC  criterion  and  its  variations  have  achieved  substantial  success,  mostly  in 
AR  model  fitting,  their  application  is,  of  course,  not  limited  to  AR  time  series  modeling. 

In  this  work,  we  have  applied  the  AIC  to  the  problem  of  cluster  validation.  The 
<5o]iiticr.  is  then  us^'d  to  find  the  number  of  distinct  image  classes  in  an  observed  image. 
There  has  been  little  previous  work  on  the  application  of  the  AIC  to  cluster  validation. 
Sclove  [I7j  demonstrated  a  way  to  use  the  AIC  to  verify  image  segmentation  results. 
After  segmenting  a  synthetic  image  under  the  assumption  of  two  and  three  classes,  the 
AIC  was  used  to  verify  that  the  segmentation  with  three  cleisses  is  a  better  segmentation. 
Our  results  differ  from  Sclove’s  work  in  that  we  apply  the  AIC  explicitly  to  the  cluster 
validation  problem  and,  in  the  application  to  image  segmentation,  we  use  the  AIC  to  decide 
the  proper  number  of  clcisses  in  an  image  before  segmentation.  The  explicit  evaluation  of 
the  AIC  is  obtained  by  an  approximate  majcimum-  likelihood  (ML)  estimation  algorithm  to 
be  described.  We  demonstrate  the  effective  application  of  this  procedure  to  both  synthetic 
data,  where  the  true  number  of  classes  is  known,  and  to  real-world  aerial  photographs,  in 
which  case  the  number  of  claisses  is  unknown  and  can  be  aissessed  only  subjectively. 

In  the  next  section,  we  will  formulate  the  cluster  validation  problem  as  a  mixture 
model-fitting  problem  and  describe  how  to  determine  the  number  of  clusters  by  using  the 
AIC.  Then,  in  Section  4.4.3,  we  will  show  some  experimental  results  in  which  the  number  of 
clusters  is  determined  from  synthetic  data  or  real  image  data  using  the  AIC  criterion.  We 
will  also  show  real-world  image  segmentation  results  obtained  with  the  number  of  classes 
determined  by  the  AIC.  Finally,  a  summary  and  conclusions  are  provided  in  Section  4.4.4. 
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The  Model  Fitting  Approach: 

In  this  work,  we  determine  the  number  of  clusters  from  the  observed  data  by  find¬ 
ing  the  best-fitting  random  mixture  model  for  the  data  using  the  AIC  criterion.  Assume 
that  the  sample  data  is  represented  by  N  independent  and  identically  distributed  (i.i.d.) 
m-dimensional  vectors,  Y  =  {y  i,y2)  •••jyA/'}-  Furthermore,  cissume  that  a  mixture  distri¬ 
bution  can  be  used  to  model  the  probability  distribution  of  yeY.  That  is, 

K 

p(y)  =  H  ^fcPfc(y);  yfY,  (i) 

A:=l 

where  the  Pk[y)'^  are  individual  m-dimensional  component  pdf’s  with  as  the  weights 
such  that 


TTfc  >  0,  for  A:  =  1,  2, . . . ,  X  (2a) 

and 

K 

'^TTk=l.  (2b) 

k-l 

The  number  K  is  the  number  of  mixture  components  and  is  used  as  an  indicator  of  the 
number  of  clusters.  That  is,  we  consider  each  cluster  in  the  data  as  a  component  of  the 
mixture  distribution  with  K  the  number  of  clusters. 

A  special  case  of  the  mixture  distribution  is  the  Gaussian  mixture  where  the  indi¬ 
vidual  pdf’s,  Pk{y),k  =  1,2,...,/^,  are  all  Gaussian  [9].  For  example,  suppose  K  =  2 
and  m  =  2  and  suppose  the  components  of  the  individual  sample  vectors  are  indepen¬ 
dent.  In  this  case  the  Gaussian  mixture  is  completely  defined  by  the  parameter  vector^ 
a  -  (mi,m2,cr^,cr|,7ri)  where  irik  and  <7^  are  each  of  dimension  m  —  2  representing, 
respectively,  the  mean-value  vector  =  {mki,rnk2)  and  variance  vector  -  (^fcn^^o) 
associated  with  Pfc(y)>^  -  1,2.  The  parameter  vector,  a,  is  of  dimension  A"'  —  9  in  this 
case. 

^We  will  make  use  of  this  notation  in  describing  some  experimental  results  in  the  next 
section. 


4.4.5 


I'nder  the  mixture  distribution  modeling  assumption,  the  problem  of  determining  the 
number  of  clusters  for  the  observation  Y  becomes  that  of  finding  the  best-fitting  mixture 
model  for  Y.  The  resulting  K  in  that  model  would  then  be  taken  as  a  good  estimate  for 
the  number  of  clusters.  According  to  Akaike,  the  best  fit  should  be  the  one  that  maximizes 
a  generalized  entropy  or  minimizes  the  AIC  criterion  defined  as 

AIC{K)  -  2/o5lmaximum-Iikelihood  of  the  model(A")]  H-  2K\  (3a) 

where 

mcLximum-likelihood  of  the  model(A’)  =  p/f(Y  |  (3b) 

Here,  is  the  maximum-likelihood  (ML)  estimate  of  the  model  parameter  vector, 

a,  of  the  mixture  model  given  A,  the  number  of  components,  and  K'  is  the  number  of 
independent  parameters  of  the  A-component  mixture  model.  In  the  case  of  the  Gaussian 
mixture  model,  the  vector  consists  of  ML  estimates  for  the  parameters  of  the  Gaussian 
component  pdf’s  ajid  the  first  A-1  weights,  7ri,7r2, ttk'-i-  Now,  for  a  given  set  of  sample 
data  vectors,  Y,  the  optimal  estimate  of  the  number  of  clusters  is 

AIC{K),  (4) 

where  K^ax  is  a  prespecified  upper  limit  for  A.  Rigorous  justification  of  the  AIC  for 
model  fitting  can  be  found  in  a  series  of  papers  by  Akaike  (10]-[14].  This  method  can  be 
ecLsily  implemented  provided  we  can  find  the  ML  estimate  of  the  mixture  model  parameters 
which  is  known  in  statistical  data  analysis  cis  the  mixture  estimation  problem  [9|. 

The  ML  estimation  approach  has  been  a  very  successful  method  in  stochastic  model 
parameter  estimation  for  the  pdf’s  which  contain  only  one  component.  Explicit  solution 
can  often  be  found  by  solving  the  likelihood  equation  and  the  ML  estimate  in  many  cases 
is  consistent  18  .  Even  in  the  raise  where  the  true  distribution  of  the  data  is  not  the  same 
cLS  the  model,  the  consistency  property  often  still  holds  under  mild  regularity  conditions 
19’.  This  result  is  especially  important  since,  when  we  try  to  use  a  model  ic  approximate 
an  unknown  probability  distribution  using  ML  estimation,  we  hope  tht  the  estimates  are 
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consistent.  Unfort nnately,  those  results  do  not  readily  extend  to  the  mixture  distributions 
9  .  First  of  all.  explicit  solution  is  impossible  even  for  the  t'vo-component  case.  Secondly, 
the  likelihood  surface  often  has  singularity  points  which  makes  numerical  solution  difficult. 

major  reason  for  this  is  that  the  data  is  incomplete  in  the  sense  that  we  do  not  know 
a  priori  to  which  cluster  a  data  vector  belongs.  However,  a  number  of  approximate  ML 
algorithms  do  exist.  One  of  the  more  popular  methods  is  the  so-called  EM  (expected 
maximum)  algorithm  9,.  It  has  been  shown  that  under  mild  regularity  conditions  it  does 
provide  local  maxima  that  are  consistent.  However,  a  disadvantage  of  the  EM  algorithm 
is  its  relatively  slow'  convergence. 

In  this  work  we  use  an  approximate  ML  estimation  scheme  using  a  clustering  algo¬ 
rithm.  First  of  all,  the  K-means  clustering  algorithm  is  applied  to  the  data  to  divide  the 
data  into  K  groups.  Then  each  group  is  assumed  to  correspoiid  to  the  sample  data  for 
one  and  only  one  mixture  component.  A  ML  estimate  is  then  evaluated  on  each  group 
separately  to  estimate  the  parameters  for  the  corresponding  mixture  component.  Finally, 
a  component  weight  i.e.,  the  tt^’s  can  be  estimated  as  the  ratio  of  the  number  of  samples 
in  a  group  tc  the  total  number  of  samples.  This  approximation  transforms  the  problem  of 
ML  estimation  of  a  mixture  to  that  of  ML  estimation  of  several  individual  p.d.f.’s.  It  will 
be  shown  in  the  next  section,  through  experimental  results,  that  it  provides  reasonably 
good  estimates.  This  scheme  also  converges  feist  since  the  clustering  algorithm  is  known 
to  possess  fast  convergence  properties. 

The  Gaussian  mixture  model  is  the  most  studied  mixture  model  because  it  is  a  realistic 
model  for  many  applications  and  it  is  mathematically  tractable.  In  this  work,  we  make 
explicit  use  of  the  Gaussian  mixture  model.  To  further  simplify  the  mathematics,  we 
a.ssume  the  components  of  the  individual  observation  vectors  are  independent."  Under 
these  assumptions,  the  procedure  of  determining  the  number  of  clusters  in  a  set  of  observed 
data  can  be  stated  as  follows; 

1.)  For  a  given  K  1, 2, ..., apply  the  /L-means  clustering  algorithm  with  the 
number  of  clacsses  preset  to  K. 

"The  component  p.d.f.’s  are  then  completely  described  by  their  mean  value  vectors  nu  - 
m^.2, ...,  and  variance  vectors  <7^  -  ,  <7^2’ ^fcm) ’  ^  “  1,2,...,  A'. 
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2. )  Estimate  the  mean  and  variance  vectors  for  each  cluster  and  the  weight  of  each  cluster. 

3. )  Compute  AIC{K). 

4. )  Select  Ko  cis  the  estimate  of  the  number  of  clusters  in  the  data  if  it  minimizes  AIC{K) 

for  all  K  =  1,2, ...,  K,nax- 

There  are  two  applicable  expressions  for  the  likelihood  functional  in  the  use  of  the 
AIC  criterion.  If  we  consider  the  data  vectors  to  be  incomplete,  that  is,  the  class  status  of 
the  samples  is  unknown,  we  will  have  the  standard  likelihood  expression  for  the  mixture 
which,  from  (1),  becomes 


N  K 

P/f  (Yla)  =  n  T^fcPfclyi).  (5) 

i-l  fc:=l 

On  the  other  hand,  if  we  first  classify  the  data  by  applying  the  K-means  algorithm, 
we  in  effect  assign  data  vectors  to  hypotheses  classes.  In  this  case  a  data  vector  assigned 
to  class  k  can  be  considered  coming  from  a  particular  class  and  has  a  probability  itk  of 
occurring.  The  corresponding  expression  for  the  likelihood  functional  for  correct/y  classified 
samples  then  becomes 


K  Nk 

Pff(Y|a)  =  7rf"  pfc(yfcj,  (6) 

where  <  N  is  the  number  of  samples  in  the  cluster  and  yk,,j  =  1,2,...,^^  are 
data  vectors  associated  with  this  cluster.  Since  we  have  used  the  /T-meaiis  algorithm  for 
approximate  ML  estimation,  each  sample  vector  is  eissigned  to  a  unique  class.  In  what 
follows,  we  will  make  use  of  the  second  likelihood  functional  as  expressed  by  (6).  The  ML 
estimate,  to  be  used  in  (3)  in  computing  AIC(/<')  is  then  formed  from  the  resulting 

K  class-conditional  parameter  estimates  together  with  the  estimated  weights  as  described 
above. 

The  method  proposed  here  is  quite  general  in  that  we  can  use  assumptions  for  the 
single  mixture  components  other  than  Gaussian.  Furthermore,  other  AIC  related  criteria, 
such  as  BIC’s,  can  be  properly  adapted  to  it.  Finally,  we  note  that  the  AIC  criterion 
has  a  useful  intuitive  appeal.  More  specifically,  when  two  or  more  models  are  almost 
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equally  likely,  in  ihe  sense  they  have  approximately  the  same  maximum  likelihoods,  the 
Altl  criterion  selects  the  one  with  the  smaller  number  of  parameters  or  the  least  complex. 

1.4.3  Experimental  Results: 

In  this  section  we  demonstrate  the  effectiveness  of  the  model-fitting  approach  to  de¬ 
termining  the  number  of  clusters  from  observed  data.  First,  we  perform  an  experiment  on 
synthetic  data  where  the  sample  vectors  are  indeed  generated  from  a  Gaussian  mixture 
distribution  as  described  in  the  last  section.  Then,  we  will  apply  the  same  approach  to  the 
image  segmentation  problem  to  identify  the  number  of  image  classes  present  in  an  observed 
image.  The  synthetic  data  set  is  used  to  study  the  ideal  performance  of  this  approach, 
while  the  image  data  is  used  to  assess  its  application  to  a  particular  real-world  problem. 
We  now  present  the  results  for  these  two  cases  separately. 

A.)  Synthetic  Data: 

In  this  experiment,  three  two-dimensional  (m  =  2)  Gaussian  mixture  data  sets  with 
two,  three  and  four  components,  or  clusters,  are  generated  sls  shown  in  Fig.  4.4-1.  VVe 
choose  the  data  to  be  two-dimensional  since  it’s  then  easy  to  display  on  a  plane.  There  are 
two  objectives  of  this  experiment:  first,  to  see  if  the  approximate  ML  estimates  provide  a 
reasonable  estimate  of  tlic  true  model  parameters  and,  secondly,  to  see  whether  the  AIC 
provides  correct  estimates  of  the  number  of  clusters,  even  in  the  ideal  case.  The  results 
of  the  parameter  estimates  for  al!  the  test  data  sets  under  the  correct  assumptions  on  the 
number  of  clusters  are  shown  in  Table  4.4-1.  It  can  be  observed  that  when  the  assumption 
of  the  number  of  classes  acting  corresponds  to  the  true  but  unknown  value,  the  parameter 
estimates  are  quite  accurate.  This  indicates  that  the  approximate  ML  estimation  scheme 
using  clustering  is  quite  effective.  In  Table  4.4-2,  we  have  shown  the  AIC’s  computed  for 
all  the  test  data  under  the  assumptions  of  different  number  of  clusters,  with  --  8. 

VVe  find  that  the  AIC  does  make  correct  decisions  each  time.  This  indicates  that  when 
the  data  is  indeed  a  Gaussian  mixture,  the  method  proposed  here  tends  to  estimate  the 
number  of  clusters  correctly.  Additional  examples  are  given  for  a  much  larger  variety  of 
Gaussian  mixtures  in  19  with  similar  results. 

/i.J  Application  to  Image  Data: 

In  this  experiment,  we  attempt  to  apply  the  method  proposed  in  the  previous  section 
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to  a  stochii^stic  modol-basod  imago  segmentation  procedure  developed  previously  in  ;2j-i4l. 
In  our  work,  we  take  the  point  of  view  that  image  segmentation  is  the  process  of  assigning 
the  pixels  of  the  image  to  a  finite,  and  usually  small,  number  of  image  model  classes.  In 
a  stochastic  model-based  approach,  different  image  classes  are  modeled  by  random  field 
models.  For  simplicity,  we  consider  the  Gaussian  model  used  in  (2]  for  tonal  properties 
of  the  image  classes.  That  is,  each  image  class  is  modeled  by  an  i.i.d.  Gaussian  random 
field.  Then,  each  image  class  can  be  characterized  in  terms  of  a  model  parameter  vector 
consisting  of  only  two  components:  the  mean  and  variance. 

In  the  segmentation  process,  the  pixels  are  assigned  to  model  classes  through  a  like¬ 
lihood  test  based  on  the  Gaussian  model.  In  particular,  a  likelihood  test  for  each  pixel  is 
performed  on  the  data  contained  in  a  decision  window  of  a  fixed  size  centered  at  that  pixel 
position.  Before  the  image  can  be  segmented,  however,  the  model  vectors  corresponding 
to  different  image  classes  have  to  be  estimated  from  the  image.  It  was  suggested  in  [2l  that 
this  can  be  realized  by  a  clustering  approach  on  the  sample  model  vectors  estimated  from 
a  sliding  estimation  window  on  different  spatial  locations  in  the  image  and  the  resulting 
cluster  centers  can  then  be  taken  as  the  model  vectors  for  the  image  classes.  The  clustering 
algorithm  used  v/as  the  /C-means  algorithm  in  which  the  number  of  image  classes,  or  clus¬ 
ters,  needs  to  be  specified  beforehand.  The  method  proposed  here  provides  an  objective 
way  to  determine  the  number  of  image  classes. 

In  Fig.’s  4.4-2a  and  4.4-3a,  we  show  two  aerial  photographs.  The  first  contains  a 
building,  roads  and  vegetation  while  the  second  contains  an  oil  tank  complex  surrounded 
by  vegetation.  The  computed  AIC’s  for  different  numbers  of  clusters  are  shown  in  Table 
4.4-3  with  Kmax  —  10-  The  sliding  estimation  window  is  of  size  16  x  16  pixels.  The  results 
suggest  that  in  the  first  image  there  are  four  tonal  classes  while  for  the  second  image 
five  tonal  classes  best  fits  the  data.  The  images  are  segmented  using  the  corresponding 
model  vectors  estimated  according  to  that  suggested  by  the  AIC  criterion  and  are  shown 
in  Fig.’s  4.4-2  and  4.4-3,  along  with  the  original  images.  In  these  segmentations  different 
tonal  areas  are  well  separated.  For  comparison  purpose  we  have  also  shown  the  results 
of  the  segmentation  using  from  two  up  to  six  classes.  It  can  be  seen  from  the  results  for 
both  images  that,  when  the  assumed  number  of  classes  is  smaller  than  that  determined 
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by  the  AIC,  a  number  of  signihcant  regions  of  reasonably  large  size  are  missing  from 
the  segmentation.  On  the  other  hand,  when  the  number  of  classes  is  larger  than  that 
suggested  by  the  AIC,  no  significant  change  in  segmentation  will  result  from  the  increase 
of  the  number  of  classes  except  the  appearance  of  some  noisy  regions  with  small  size. 
This  suggests  that  the  AIC  model-fitting  approach  is  a  reasonable  objective  approach  for 
practical  applications  such  as  image  segmentation. 

Summary: 

In  this  paper  we  described  a  model-fitting  approach  for  determining  the  number  of 
clusters  in  observed  random  data  and  its  applications  to  stochastic  model-  based  image 
segmentation.  The  problem,  also  known  as  cluster  validation,  is  solved  by  findirg  a  best- 
fitting  mixture  distribution  model  to  the  observed  data.  The  goodness  of  fit  is  determined 
by  the  .\IC  criterion.  An  approximate  ML  parameter  estimation  scheme  using  clustering  is 
proposed  to  compute  the  AIC.  Experimental  results  are  also  described  to  demonstrate  the 
ideal  performance  and  practical  applicability  of  this  method.  In  the  experiments,  the  AIC 
correctly  determines  the  number  of  clusters  in  the  synthetic  mixture  data  and  provides  a 
subjectively  reasonable  number  of  classes  for  a  number  of  real-world  images.  These  results 
indicate  that  the  proposed  approach  is  quite  general  and  effective. 

This  work  also  brings  up  several  interesting  issues  which  need  further  investigation. 
First  of  all,  it  would  be  of  interest  to  apply  the  BIC  criteria  to  cluster  validation  and 
compare  the  results  with  that  of  the  AIC.  To  do  so  we  need  to  decide  on  what  parameter 
set  the  averaging  of  the  likelihood  is  to  be  performed  and  how  to  implement  the  numerical 
integration  involved  in  the  averaging.  It  would  also  be  of  interest  to  use  the  EM  algorithm 
as  the  estimation  method  for  computing  the  AIC  and  compare  the  results  on  Gaussian 
mixture  model-fitting  with  those  described  in  this  paper.  Finally,  work  is  underway  in 
applying  the  model-fitting  approach  to  image  segmentation  where  the  image  classes  are 
modeled  as  autoregressive  random  fields  [3].  This  work  will  be  reported  on  at  some  later 
time. 
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Figure  4,4-1 

Exaimples  of  Synthetic  Gaussian  Mixture  Data 
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4.5  Overall  SuminaJV  and  Conclusions: 

We  have  been  very  active  over  the  last  year  in  further  refining  our  concept  of  a  region- 
based  hierarchiccil  approach  to  image  interpretation.  A  major  thrust  has  been  in  the 
development  and  implementation  of  improved  image  segmentation  schemes.  We  expect  to 
continue  this  effort  into  FY’88  ajid,  in  particular,  to  concentrate  more  on  improvements 
in  the  interpretation  process.  Issues  to  be  investigated  will  include: 

1.  Investigate  techniques  for  fusing  information  from  different  image  segmentation 
schemes  to  provide  performance  improvements  over  that  achievable  with  any 
single  scheme. 

2.  Investigate  improvements  in  the  information  theoretic  criteria  for  unsupervised 
determination  of  the  number  of  different  image  classes  present. 

3.  Develop  and  investigate  techniques  for  choosing  the  appropriate  model  type  for 
stochastic  model-based  image  segmentation  schemes. 

4.  Devise  and  investigate  techniques  for  incorporating  knowledge  information  into 
image  segmentation  schemes.  In  particular,  investigate  techniques  for  incorpo¬ 
rating  feedback  from  the  interpretation  process. 

5.  Additional,  and  perhaps  more  powerful,  features  have  to  be  incorporated  into  the 
image  segmentation  procedure. 

6.  Object  detection  and  boundary  extraction  procedures  need  to  be  incorporated 
into  the  image  segmentation  process. 

7.  More  comprehensive  region  and  mutual  attributes  need  to  be  employed  in  the 
image  interpretation  process. 

8.  The  manual  image  segmentation  procedure  needs  to  be  improved  and  interfaces 
with  knowledge  database  worked  out. 

9.  Our  raw  image  databcise  needs  to  be  expanded. 

10.  More  general  procedures  for  designing  the  clique  functions  need  to  be  worked  out. 

11.  Optimum  annealing  schedules  for  effecting  the  simulated  annealing  search  proce¬ 
dure  need  to  be  developed. 
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12.  Propagation  of  interpretations  from  one  region  to  the  next  needs  to  be  investi¬ 
gated. 

13.  We  need  to  provide  feedback  from  the  interpretation  process  to  the  segmentation 
process  to  improve  its  performance. 

14.  We  have  to  investigate  how  map  data  and/or  archival,  previously  interpreted, 
image  data  cein  be  utilized  to  improve  the  photointerpretation  process  or  to  im¬ 
plement  change  detection/interpretation  procedures. 
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MISSION 


of 

Rome  Air  Development  Center 


RADC plans  and  executes  research,  development,  test  and  selected 
acquisition  programs  in  support  of  Command,  Control,  Communications 
and  Intelligence  (C^I)  activities.  Technical  and  engineering  support  within 
areas  of  competence  is  provided  to  ESD  Program  Offices  (POs)  and  other 
ESD  elements  to  perform  effective  acquisition  of  C^ I  systems.  The  areas 
of  technical  competence  include  communications,  command  and  control, 
battle  management,  information  processing,  surveillance  sensors, 
intelligence  data  collection  and  handling,  solid  state  sciences, 
electromagnetics,  and  propagation,  and  electronic,  maintainability,  and 
compatibility. 


